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Shannon  of  Bell  Laboratories  devised  an  experiment  to  illustrate  the  capabilities  of 
telephone  relays.  Here,  an  electrical  mouse  finds  its  way  unerringly  through  a  maze, 
guided  by  information  remembered  in  the  kind  of  switching  relays  used  in  dial  telephone 
systems.  Experiments  with  the  mouse  helped  stimulate  Bell  Laboratories  researchers  to 
think  of  new  ways  to  use  the  logical  powers  of  computers  for  operations  other  than 
numerical  calculation." 

Photograph  of  Claude  Shannon  and  Dave  Hagelbarger  at  Bell  Labs  in  March  1955. 
Caption:  "Claude  Shannon,  the  originator  of  Information  Theory,  at  the  board  and  Dave 
Hagelbarger  work  out  some  equations  needed.  Their  current  projects  include  work  on 
automata-advanced  type  of  computing  machines  which  are  able  to  perform  various 
thought  functions. 

Photograph  of  Claude  Shannon  taken  in  1980's.  Photographer  unknown. 
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ABSTRACT 


A  study  is  made  of  the  possibilities  of  using 
the  Lakato s- Hickman  type  relay  for  the  counting,  regis- 
tering, steering,  and  pulse  apportioning  operations  in 
a  subscriber  sender.      Cirouits  are  shown  for  the  more 
important  parts  of  the  circuit  where  it  appears  that  the 
new  type  relay  would  effeot  an  eoonomy. 
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August  15,  1940 


MEMORANDUM  FOR  ITU 

The  Lakatos-Siokmen  type  relay1* using  the  relay  springs 
as  part  of  the  magnetic  eiroult  can  he  used  as  a  very  eeonomioal 
type  of  pulse  counter  and  registration  device.    In  faot ,  one  suoh 
relay  with  twenty  moving  springs  can  count  and  register  up  to  ten 
pulses,  while  the  same  operation  requires  at  least  five  ordinary 
relays,  and  some  standard  oirouits  use  as  many  as  twenty  to  re- 
duce the  spring  loading  on  the  relays  and  the  contact  loading  in 
the  pulsing  circuit.    It  has  been  suggested  that  this  new  type 
of  relay  might  he  used  for  some  or  all  of  the  many  counting, 
steering,  and  registration  oirouits  in  a  subscriber  type  sender* 
The  present  memorandum  gives  some  oirouits  for  accomplishing 
this*    The  chief  problem  in  the  design  of  these  oirouits  Is 
that  of  performing  the  various  translating  operations  necessary 
in  converting  the  incoming  pulses  into  group  and  brush  selections, 
or  P.C.I,  pulses  as  the  oase  may  be,  without  using  more  oontaot 
elements  than  are  available  on  the  counting  relay.    Two  different 
solutions  are  given  here.    The  first  was  made  as  economical  as 
possible  but  at  the  oost  of  one  disadvantage.    Under  certain 
conditions  of  oontaet  failure  in  the  thousands  or  hundreds  regis- 
ter the  sender  will  oonneot  the  subscriber  to  an  incorrect  number 
rather  than  connect ing  to  a  tell-tale  and  giving  him  a  busy  sig- 
nal.   The  seoond  oiroult,  which  we  will  call  the  positive  aotion 
oiroult^,  is  designed  to  overcome  this  difficulty  but  does  so  at 
the  expense  of  more  contaots  and  wiring.    Some  compromise  between 
these  circuits  may  be  the  most  desirable.    The  oirouits  by  no 
means  represent  a  complete  sender.    It  appears  that  the  problems 
connected  with  the  offioe  code  (i.e.  the  first  two  or  three 
digits)  can  be  handled  without  muoh  difficulty.    At  any  rate 
these  oirouits  will  depend  on  the  type  of  decoder  used,  and 
would  represent  a  second  stage  in  the  design*    We  have  therefore 
designed  what  might  be  called  a  "four  digit  sender**  considering 
only  the  problems  arising  in  the  thousands,  hundreds,  tens  and 
units  digits.    We  also  have  omitted  consideration  of  the  parts 
of  the  oiroult  used  for  control  and  supervisory  purposes,  since 
these  can  be  easily  handled  by  existing  oirouits,  and  do  not 
directly  involve  the  new  type  relay.    Our  chief  purpose  is  to 

Isee  "Oiroult  Analysis  for  Laxatos-Eiokman  Type  Relay", 
0.  R.  Stibits,  MM40-150-1BO,  Jan.  15,  1940,  Oase  £0878. 

^This  circuit  was  suggested  by  Hr.  0.  T.  King 


■how  that  the  new  type  counter  oontalna  sufficient  contact 
element!  for  aost  of  the  steering  and  counting  circuit*  of  the 
subscriber  sender.    It  is  always  possible  to  add  more  contacts 
at  an/  stage  in  the  new  type  counter  by  the  arrangement  of 
springs  in  Jig.  1,  but  this  would  be  undesirable  from  the 
standpoint  of  standardization*    At  any  rate  it  was  found  that 
even  in  the  positive  action  circuit,  only  two  stages  in  one 
register  needed  more  contacts  than  are  already  available,  and 
two  additional  ordinary  relays  were  introduced  here  to  carry  the 
contact  load* 

It  should  be  pointed  out  that  an  extremely  simple  and 
economical  sender  (i.e.,  much  simpler  than  those  given  here) 
could  be  designed  using  the  new  type  counter  were  it  not  for 
the  peculiar  translation  codes  involved.    Thus  if  we  could  start 
*Yrom  scratch"  and  design  translation  codes  particularly  adapted 
to  the  characteristics  of  the  new  relay,  the  circuits  could  be 
made  very  simple  indeed.    Even  using  the  existing  oodes  which 
were  constructed  to  simplify  the  present  type  olrouits,  the  use 
of  the  new  counter  allows  a  remarkable  simplicity  and  economy* 

The  circuits  were  designed  by  a  combination  of  common 
sense  and  Boolean  algebra  methods.    We  will  omit  the  details 
involved  in  their  design.    Although  it  is  possible  that  a  few 
superfluous  elements  remain,  it  is  doubtful  if  they  can  be 
simplified  very  much* 

Figure  E  is  a  block  diagram  of  the  proposed  sender* 
In  the  present  panel  and  crossbar  senders,  pulse  counting  is 
done  in  the  same  circuit  for  each  digit  and  the  numbers  trans- 
ferred from  this  counting  circuit  to  a  set  of  registering  cir- 
cuits, one  for  eaoh  digit,  through  an  incoming  steering  chain. 
The  registering  circuits  in  the  panel  type  sender  consist  of  a 
set  of  five  ordinary  relays  per  digit,  while  in  the  crossbar 
system  the  A  digit  is  registered  on  one  or  two  verticals  of  a 
crossbar  switch*    In  Figure  S,  on  the  other  hand,  eaoh  digit 
has  one  of  the  new  type  counter  relays  which  acts  both  as  a 
pulse  counter  and  as  a  register.    The  incoming  steering  chain 
steers  the  incoming  pulses  to  the  correct  counter-register 
rather  than  steering  the  number  recorded  by  the  input  pulse 
counter  to  a  digit  register*    The  input  steering  chain  may  or 
may  not  be  one  of  the  new  type  counters*    The  steering  opera- 
tion can  be  done  with  the  new  type  counter,  but  it  appears  to 
require  special  devices,  as  for  example  polarised  springs,  in 
order  to  energize  both  windings  of  the  register  relays  after 
receiving  a  digit*    Even  using  the  present  type  of  steering 
chain  a  great  simplification  is  possible,  for  only  one  wire, 
the  pulsing  lead,  needs  to  be  steered  to  the  various  digit 
registers,  rather  than  the  five  leads  of  the  present  type 
sender*   Another  possibility  is  using  a  new  type  counter  to 
count  the  groups  of  pulses  and  operate  a  set  of  relays  8^,  Sj, 


Sq,  Sthi  Sst  Sf »  sU  come  1a  after  the  A,  B,  0,  IB,  I,  T, 

and  U  digits  are  received  end  energize  both  eoile  of  the  corre- 
sponding registers* 


After  the  digits  are  registered  on  the  new  type 
counters,  these  numbers  are  translated  bj  means  of  the  oontaet 
interconnections  into  the  code  corresponding  to  the  incoming 
brush,  incoming  group,  final  brush,  tens,  and  units  selections, 
which  are  represented  by  a  ground  on  one  of  the  leads  in  the 
groups  marked  IB,  10,  YB,  T,  and  V,  respectively.    These  groups 
of  leads  are  connected  in  sequence  to  the  revertive  pulse  counter 
by  means  of  the  revert  ire  group  counter*    The  revertive  pulse 
counter  will  be  one  of  the  new  type  relays  and  is  connected  in 
suoh  a  way  as  to  open  the  fundamental  circuit  and  thus  stop  the 
revertive  pulsing  when  it  reaches  the  first  ground.    The  revertive 
group  counter  or  revertive  steering  chain,  of  course,  steps  ahead 
after  each  group  of  revertive  pulses  through  the  action  of  a  slow 
release  relay.    This  last  steering  operation  cannot  be  done  solely 
with  one  of  the  new  type  relays  for  it  is  necessary  to  steer  ten 
leads  in  the  tens  and  units  digits.    It  could  be  done,  however, 
with  a  new  type  counter  in  conjunction  with  four  ordinary  relays. 

In  the  case  of  a  call  to  a  manual  office  the  outputs 
of  the  digit  registers  are  translated  by  a  P.O.I,  circuit  into 
the  correct  P.O.I,  codes.    This  circuit,  too,  can  make  use  of  the 
new  type  counter  in  the  quadrant ing  operation,  i.e.  in  apportion- 
ing four  quadrants  to  each  of  the  four  digits  to  be  transmitted. 
This  would  be  done  with  a  sixteen  stage  counter  (or  if  it  is  de- 
sirable to  have  all  oounters  with  ten  stages,  two  of  these  could 
be  connected  "in  series")  replacing  the  present  sequence  switch* 

Of  course  there  must  be  an  interlock  between  the  incom- 
ing and  revertive  steering  chains  to  prevent  any  selection  being 
made  before  sufficient  information  has  been  received.    This  can 
be  done  by  fairly  standard  methods* 

A  rough  comparison  can  be  made  between  the  relay  re- 
quirements of  the  present  panel  type  sender  end  the  design  pro* 
posed  here.    Omitting  parts  of  the  circuit  which  would  be  sub- 
stantially the  same  the  requirements  are  listed  below: 

Present 

Panel  Sender  Proposed  Sender 

Ordinary  Hew  Type  Ordinary 

Operation  Relays  Counters  Belays 

Input  Counting  1*  - 

Input  Steering  It  i  • 

Registration  »•  f 

Revertive  Counting  .   *Q  t  « 

Revertive  Steering  10   L-  JL 

Total  U  T 


In  addition,  a  eequenoe  ewitoh  la  replaoed  by  a  new  type  counter. 
Tliasa  figures  are  based  on  the  positive  action  oirouit.  Jhe 
other  oirouit  uses  6  ordinary  relays.    This  eoaparison  of  the 
numbers  of  relays  involved  shows  only  a  small  part  of  the  saving, 
however.    The  wiring  and  fundamental  method  of  operation  of  the 
new  oirouit  is  muoh  simpler  which  tends  both  toward  eoonomy  and, 
providing  the  new  relay  ©an  be  made  suffielently  reliable,  elim- 
ination of  faults  and  errors* 

It  is  a  little  more  difficult  to  give  a  quantitative 
comparison  of  tha  proposed  sender  with  the  present  crossbar  type 
sender  due  to  the  differences  in  the  types  of  oirouit  elements  In- 
volved, but  it  appears  that  the  saving  would  be  of  the  same  order 
of  magnitude* 

The  new  type  counter  with  ten  stages  aota  like  a  series 
of  twenty  relays  which  come  in  sequentially  as  the  two  coils  of 
the  relay  are  alternately  energized.    Thus  after  n  pulses  the 
first  Sn  relays  are  operated.    If,  after  a  series  of  pulses  only 
one  of  the  two  coils  on  a  counter  remains  energized  we  can  only 
be  sure  of  the  oontacts  on  that  side.    It  was  found  that  under 
these  conditions  the  number  of  eontaots  available  was  far  too 
small  in  all  of  the  four  registers  for  the  various  translating 
operations  neoessary.    We  have  therefore  assumed  the  steering 
circuit  should  be  designed  in  such  a  way  as  to  energize  both 
coils  of  a  counter  after  it  has  received  its  series  of  pulses** 
This  insures  the  oontacts  on  both  sides  and  each  stage  then  has 
the  equivalent  of  two  transfer  eontaots  and  two  additional  eon- 
taots somewhat  similar  to  a  switohhook  connection.    Thus  eaoh 
stage  may  be  considered  as  a  relay  with  the  eontaots  available 
indicated  In  figure  5.    Our  circuit  diagrams  are  drawn  from 
this  point  of  view* 

Tor  the  convenience  of  the  reader  we  will  list  the 
various  translation  oodes  used  in  the  sender*    The  incoming 
brush  seleotlon  depends  only  on  the  thousands  digit  and  Is 
given  by  the  following  tablet 

Incoming  Brush 
Selection 

0 
1 
t 
8 
4 


Thousands 
Digit 

0,  1 
*,  * 
4.  5 


•See  the  memorandum  "Oirouit  Arrangement  for  Counting  Relay  with 
Mechanically  Independent  Contact  Springs",  by  B*  D.  Bolbrook, 
HM-40-130-149,  July  5,  1940,  Oase  ££108-1. 


The  incoming  group  ssleotion  depends  on  both  the 
hundreds  and  thousands  digits  and  is  given  bj  tha  following; 


Thousands 
Digit 


Hundred! 
Digit 


odd 

odd 


<  6 

<  5 


Inooeiing  Group 
Salaotion 

0 
1 
t 
9 


digit, 


Tha  final  brush  salaotion  dapands  only  on  tha  hundreds 
We  hare  tha  following  oodat 


Hundred! 
Digit 

0,  6 

1.  • 

*,  1 

3,  8 

4,  • 


Final  Brush 
Salaotion 


s 


3 

4 


P.O.I.  Oode  for  Thousands  Digit 


It  should  be  remembered  that  an  inooming  brush,  incom- 
ing group,  or  final  brush  saleotion  of  &  corresponds  to  n  ♦  1 
rerertire  pulses.    Tha  same  remark:  applies  to  tha  tans  and  hun- 
dreds selection. 

Digits  are  sent  to  a  call  indicator  bjr  series  of  posi- 
tive and  negative  pulses,  four  for  aaoh  digit*    Two  different 
codes  are  used  for  this,  one  for  the  thousands  digit  and  tha 
other  for  thehuadreda,  tans,  and  units.    The  thousands  oode  is 
an  additive  one  baaed  on  the  numbers  1,  2,  4,  and  8  as  follows: 


IT 

0 
0 

m 

0 

m 

0 
0 

1 


Thousands 
Digit 

1 
8 
5 
4 
5 


* 

8 
9 

0 

Corresponding  Additive 
Fumbers 


I 

0 


0 
0 


0 
0 

0 


II 

0 
0 


Quadrant 


0 
0 
0 


III 

0 
0 
0 
0 
0 
0 
0 


0 
8 
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The  sum  of  the  numbers  ocr  responding  to  tht  columns  in  whioh  a 
digit  has  tha  symbol  -  gives  that  digit,  henot  tha  additive 
property  of  tha  code.    In  this  tabla  I,  II.  IH,  and  IT  refer 
to  tha  four  pulses  or  quadrants.    In  the  first  and  third  quadrants 
0  represents  a  ground  and  a  -  represents  a  posit ire  pulse.    In  the 
even  quadrants  0  means  a  light  negative  pulse  and  the  -,  a  hear? 
negative  pulse.    We  have  chosen  this  representation  of  the  oode 
for  comparison  with  the  P.O.I,  circuit  in  which  four  leads  are 
grounded  or  not  in  aooordanoe  with  the  above  table*    Thus  if  the 
digit  8  is  registered  in  the  thousands  place,  lends  II  and  HI  in 
a  group  I,  II,  III,  IT  are  grounded.    The  presence  or  absence  of 
these  grounds  are  translated  into  positive  or  negative  pulses  by 
two  relays  TS  and  RS. 

The  hundreds,  tens,  and  units  P.O.I,  code  is  also  addi- 
tive based  on  the  numbers  1,  S,  4,  6.  Using  the  same  conventions 
it  is  represented  by  the  following  table: 

P.O.I.  Oode  for  Hundreds,  Tens,  and  Units  Digits 

H,  T,  or  Quadrant 

u  Digit       i       n       in  it 

i  .000 

t  o-oo 

8  ..00 

4  0  0  -  0 

5  0  0  0  - 

6  -00 

T  0  —  0  — 

8  -  -  0 

9  0  0- 

0  0  0  0  0 

Corresponding 

Numbers       (1)         (8)         (4)  (5) 

The  circuit  for  the  tens  or  units  register  is  shown  In  Figure  4. 
The  operation  is  quite  obvious.    In  the  ease  of  a  full  mechanical 
call,  if  6  for  example  were  dialed  in  the  tans  plaee,  the  first 
six  relays  are  looked  in,  which  places  a  ground  on  the  lead  marked 
6.    These  are  connected  through  the  revert ive  steering  chain  to 
the  revertive  counter  which  reaches  this  ground  after  the  seventh 
revert  ive  pulse.    The  presence  of  this  ground  operates  a  relay 
whioh  opens  the  fundamental  circuit  and  stops  the  pulsing. 
A  ground  is  also  put  on  leads  II  and  HI  for  a  P.O.I,  call. 
The  operation  of  the  P.O.I,  circuit  will  be  described  later. 
The  thousands  and  hundreds  register  is  shown  in  figure  5  for  the 
positive  action  circuit  and  in  Figure  6  for  the  more  economical 
circuit.    In  Figure  8,  many  of  the  contaots  do  double  duty, 
translating  both  for  P.O.I,  and  full  mechanical  calls.    This  is 
done  through  a  relay  P  which  is  operated  for  a  manual  call  and 
not  for  amechanical  call.   In  the  hundreds  register  there  were 
not  enough  contacts  available  in  the  fifth  and  tenth  stages. 


The  relays  R  and  8  ere  used  to  •arrjr  part  of  the  eontaot  load* 
This  oireuit  la  designed  ae  that  ohe  and  only  one  of  the  IB,  10, 
and  TB  laada  la  grounded  for  a  given  number.    In  ease  of  a  oon- 
taot failure  none  would  he  grounded  and  the  corresponding  commu- 
tator would  supposedly  go  to  a  telltale.    In  the  oirouit  of  figure 
6,  on  the. other  hand,  more  than  one  of  the  IB,  10,  or  TB  leads  may 
he  grounded  at  the  same  time.    Thus  if  the  thousands  digit  is  8, 
both  8  and  4  in  the  IB  group  are  grounded.    If  the  back  eontaet 
on  8  failed  the  rerertive  pulse  counter  would  not  stop  the  pulsing 
aotion  at  brush  8  as  it  should  but  would  go  on  to  the  fourth  brush. 
Howersr,  this  olreuit  is  considerably  simpler  than  Figure  8,  and 
does  not  appear  worse  from  the  standpoint  of  possible  wrong  num- 
bers than  the  present  type  of  sender* 

The  P.C.I,  eirouit  is  shown  in  Figure  7.    I  is  a  relay 
whioh  is  operated  in  the  odd  quadrants  and  not  in  the  even  quad- 
rants.   TS  and  RS  are  relays  whose  windings  are  oonneoted  sequen- 
tially through  the  P.O.I,  impulse  ehain  to  first  the  thousands 
P.O.I,  leads  I,  II,  IH,  and  IT,  then  the  hundreds,  etc.  aoeord- 
ing  to  the  following  tablet 


Th 
Digit 


H 

Digit 


T 

Digit 


Digit 


Pulsing 

TS 

RS 

Stage 

1 

Z 

Th  I 

Th  II 

8 

Th  III 

Th  II 

8 

z 

Th  III 

Th  IT 

4 

E  I 

Th  IT 

8 

z 

E  I 

E  II 

8 

a* 

E  III 

e  n 

;  i 

z 

E  III 

E  IT 

i  8 

m 

T  I 

E  IT 

;  • 

z 

T  I 

t  n 

10 

m 

T  in 

t  n 

11 

z 

T  HI 

T  IT 

;i» 

U  I 

T  IT 

[18 

z 

V  I 

u  n 

u  in 

u  n 

18 

z 

v  m 

U  IT 

18 

U  IT 

In  the  odd  quadrants  Z  is  operated,  placing  a  ground  on  the 
fundamental  ring  (»)•    The  fundamental  tip  (FT)  ia  connected 
through  Z  to  either  ground  or  positive  battery  according  as 
TS  is  operated  or  not.    This  depends  of  course  on  the  condl- 
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t ion  of  the  P.C.I,  lead  to  whioh  TS  is  connected  at  the  time* 
Similarly  in  the  eran  quadrants  light  or  beary  roltage  is 
applied  to  FR  according  to  the  eondition  of  RS  while  FT  is 
grounded* 

Figure  8  shows  the  rerertire  steering  chain  and  re- 
rertire  pulse  counter. 
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A  STUDY  CF  THE  DEFLECTION  MECHANISM 
AND  SOME  RESULTS  ON  RATE  FINDERS 

by  TKfS  is  a  Final 

UNDER  OmU  .T 

Claude  E.  Shannon  ^.w/L-lL  -  if)  4 


SUMMARY  OF  THE  MOST  IMPORTANT  RESULTS 

1.  The  deflection  mechanism  may  be  divided  into  three  partB. 
The  first  is  driven  by  two  shafts  and  has  one  shaft  as  out- 
put, which  feeds  the  second  part.    This  unit  has  a  single 
shaft  output  which  serves  as  input  to  the  third  part,  whose 
output  is  also  a  single  shaft,  used  as  the  desired  azimuth  cor- 
rection. 

2.  The  first  unit  is  a  simple  integrator.     It*,  output  rate  is 


3.    The  second  part  is  the  same  circuit  as  previous  rate  finders. 
Its  presence  appears  to  be  detrimental  to  the  operation  of 
the  system  from  several  standpoints.    The  output  e  of  this  part 
satisfies i 

•  ■  x-f-  y 


Ll 


4.     The  third  and  most  important  part  of  the  macnine  satisfies 


q  +  R  4  +       L  q  -  • 


in  whicht 


•  ■  an  input  forcing  function  which  except  for  transients  in 
the  seoond  part  and  other  small  effeots  ia  the  function 
whose  rate  is  to  bo  found. 

q  ■  the  rate  of  e  as  found  by  the  device.    The  output  of  the 
mechanism  is  sin"^"  Q. 

R,  L,  S  are. positive  constants  depending  on  the  gear  ratios, 

etc.  in  the  machine. 
The  mechanism  therefore  acts  like  an  R,  L,  C  circuit  in  which 
the  differential  inductance  is  a  function  of  the  current, 

v  1  -  q2 

The  system  can  be  critically  damped  for  differential  displace- 
ments near  at  most  two  values  of  the  current. 
Omitting  the  effect  of  backlash,  the  system  is  stable  for  any 
initial  conditions  whatever,  with  a  linear  forcing  function, 
e  s  At  +  fl.    It  will  approach  asymptotically  and  possibly  with 
osoillation  a  position  where  q  is  proportional  to  e.     An  error 
function  can  be  found  which  decreases  at  a  rate  -R  (q  -  qQ)2 
4o  being  the  asymptotic  value  of  q. 

If  the  system  is  less  than  critically  damped  ordinary  gear 
play  type  of  backlash  can  and  will  cause  oscillation.  This 
includes  play  in  gears,  aaaers,  lead  screws,  rack  and  pinions 
and  looseness  of  balls  in  the  integrator  carriages.     The  oscilla- 
tion is  not  unstable  in  the  sense  of  being  erratic,  or  growing 


-  3  - 


without  limit,  but  is  of  a  perfectly  definite  frequency  and 
amplitude.     This  type  of  backlash  acts  exactly  like  a  peculiar 
shaped  periodic  forcing  function.    Approximate  formulas  for 
the  frequenoy  and  amplitude  of  the  oscillation  are 


r 


2 


and 


/s2  I   UoLd  -A)2 


<*0c 

^  and  B2  being  the  amounts  of  backlash  in  the  two  driven  shafts 
as  measured  in  a  certain  manner. 

8.  elastic  deformations  of  shafts  and  plates  can  be  divided  into 
two  parts.    .One  is  exactly  equivalent  to  the  gear  type  of 
backlash  and  may  be  grouped  with  B]_  and  B2  above.    The  other 
has  the  effect  of  altering  the  parameters  R,  L,  S  of  the  cir- 
cuit and  also  adding  higher  order  derivatives  with  small  co- 
efficients.   This  will  slightly  alter  the  time  constant  and 
the  natural  frequency  of  the  system. 

9.  The  manner  in  which  the  arcsin  function  is  obtained  seems  to 
me  distinctly  disadvantageous  to  the  operation  of  the  system 
for  a  nnmber  of  reasons,  chiufly  since  to  eliminate  backlash 


oscillation  it  requires  high  overdamping  near  q  ■  0  and  this 
slows  down  the  response  for  low  target  speeds. 
10.     The  general  problem  of  rate  finding  and  snoo-hing  is  con- 
sidered briefly  from  two  angles  -  as  a  problem  in  approxi- 
mating a  certain  given  transfer  admittance  ana  as  a  problem 
in  finding  the  form  of  a  differential  equation.     The  first 
method  based  on  a  linear  differential  equation  leads  to  ten- 
tative designs  whicn  I  think  would  be  an  improvement  over  the 
present  one.     The  second  method  indicates  the  -ossibility  of 
still  more  improvement  if  non-linear  equations  can  be  satis- 
factorily analyzed. 

ANALYSIS  OF  THE  DEFLECTION  MECHANISM 

general  Considerations.     The  deflection  mechanism  is  a  aevice  de- 
signed to  find  5i  mechanically  from  the  formula 

•  in*!  =  Sa^  tp 

having  cne  shaft  whose  rate  of  turning  is£a  and  another  whose 
angular  position  is  Jj>  t?f  giving  c-t  as  the  position  of  a  shaft. 
The  system  is  also  supposed  to  smooth  out  small  errors  in^a* 
The  mechanism,  as  actually  constructed,  is  shown  in 
Figure  1.     By  a  rearrangement  of  adders,  it  may  be  drawn  as  shown 
in  Figure  2.     incidently,  the  device  of  rearranging  and  combining 


adder  units  is  frequently  useful  in  studying  these  systens.  In 
this  case  it  both  clarifies  the  physical  operation  and  simplifies 
the    mathematical  analysis.     The  box  IV  on  the  right  of  Fig.  1 
represents  two  adders  wigh,  essentially,  a  common  shaf t.  The 
output  is  equal  to  the  sum  of  the  inputs  with  the  indicated  signs 
prefixed.     A  variable  associated  with  a  shaft  represents  the  angu- 
lar position  of  that  shaft  unless  specifically  stated  otherwise. 
Gears  art  omitted  f rom  t he  diagram  but  included  as  coefficients 
in  the  equations.     It  may  also  be  worthwhile  to  point  out  that  the 
best  method  of  setting  down  the  equation  of  such  a  system  is 
usually  the  following: 

1.  Considering  oniy  the  integrators  and  function  Lie-vices, 
label  the  various  snafts  UBing  the  minimum  number  of  variaoles, 
Yiorkin^  backward  from  driver  to  driving  snafts.     Thus  if  the  out- 
put of  an  integrator  is  labeled  z,  its  displacement  is  i  (assuming 
constant  disk  rate).     If  the  output  of  an  x  to  In  x  gear  is  sin  u, 
its  input  is  esin  u  .    Marking  backwards  rives  the  differential 
instead  of  the  integral  form  of  the  equation. 

2.  Hew  concentrate  on  the  adders,  grouping  together  cs 
many  as  possible,  and  write  the  equations  of  constrain*.  These 
will  be  the  equations  of  the  system. 

I  find  the  use  of  electrical  analogues  very  useful  in 
under  standing  tnese  devices  and  have  sed  throughout  a  notation 
which  emchasizes  this  idea. 


As  the  maohine  is  drawn  in  Fig.  2,  it  consists  of  threa 
independently  operating  units.    The  output  of  the  first  i3  a 
single  shaft  serving  as  input  to  the  second,  the  output  of  the 
second  a  single  shaft  feeding  the  third,  and  the  output  of  this 
being  a  shaft  used  as     S 3, 

The  operation  is  ruughly  as  follows:     Integrator  I 
multiplies  its  disk  rate  oy  its  displacement,  so  that  the  rate 
of  turning  of  its  output  is  y  =  ^0  tp£a»    The  actual  position  of 
this  y  shaft  can  carry  no  significance.    It  is 


y  ■ 


p.    tp2a  dt   +•  y0 


a  variable  which  cepencs  on  the  entire  previous  history  of  tne 
sighting  telescopes  to  say  nothing  of  possiole  integrator  slippage. 
At  two  different  tisas,  vrith  a  target  at  the  same  position  and 
speed,  this  shaft  would  have  entirely  different  angular  nositions 
but  the  same  rate  of  turning. 

The  output  of  integrator  I  feeds  into  the  middle  uart 
cf  the  system  which  is  exactly  the  rate  finder,  of  saost  older 
directors.     This  part  of  the  divice  seems  to  me  net  only  super- 
fluous but  actually  detrimental  to  the  operation.     It  is  equiva- 
lent to  an  R,  L,  circuit  (Fig.  3)  with  impressed  voltage  y  and 
cutout  x,  che  voltage  across  the  inductance 


3.    A  small  response  h(t)  for  the  function  g(t). 

High  frequencies  in  g(t)  appear  practically  un- 
diminished  and  in  the  same  pnase  in  h(t)  since  the 
impedance  is  high  compared  to  R. 

Thus 

-  %  t 

In  ^ 

1a  t   £e  +  h(t) 

In  adder  III,  x  is  added  to  y  in  equal  proportions  to  give  e. 

e  _  y  +   ±1  A  +•  K  e    Ll    +  h(t) 
Rl 

As  vre  pointed  out  above,  y  already  contains  an  irrelevant  additive 
constant,  so  the  addition  of  another,  gj"  A  which  happens  to  be  pro- 
portional to  the  target  rate  is  of  no  possible  significance.  The 
term  K  e         '    certainly  is  only  detrimental  being  an  unwanted 
transient.    For  a  time  I  thought  that  the  reason  for  the  middle 
part  of  the  machine  was  the  final  term  h(t).    For  hi^h  frequen- 
cies this  is  approximately  g(t),  and  might  be  used  to  buck  out 
these  high  frequency  following  errors,  much  as  was  done  in  some 
early  radio  circuits  to  recuce  a-c  hum.    However,  a  study  of  the 
design  diagrams  shows  that  the  two  error  functions  are  actually 
in  phase  as  I  have  indicated  in  the  equation,  so  that  these  high 
frequency  errors  are  added,  making  the  situation  worse.    £ven  if 
the  phase  of  x  were  reversed  on  entering  adder  III,  I  think  it 


doubtful  whether  the  presence  of  this  part  of  the  system  -would  be 
justifiable.     It  would  be  necessary  to  show  that  tne  frequencies  • 
were  high  eno.gh  so  that  the  two  actually  did  cancel,  and  also 
that  the  disadvantages  of  the  transient  term  did  not  overcome  the 
advantages  obtained.    Note  that  the  middle  part  can  function  in 
no  way  as  a  rate  finder.     The  ri^ht  hand  part  of  the  machine  does 
its  own  rate  finding  as  we  will  see,  and  the  rate  found  by  the 
middle  part  could  not  possibly  be  used  because  of  the  undetermined 
constant  in  y. 

•e  prooeed  now  to  the  third  part  of  the  machine  which 
is  the  major  concern  of  the  study.    Concentrating  on  the  adder  IV, 
the  equation  of  the  system  is  obviously 

L  -|  sin"1  q=e-3q-Rq 

or 

5  qt  iiL  L  q  =  e 

This  is  the  equation  of  a  series  R,  L,  C,  circuit  with  the  in- 
ductance a  function  of  the  current  passing  through  it.  Induc- 
tance     may  be  defined  by  the  Lagrangian  equations  or  by 
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and  it  is  clear  from  the  above  equation  that 

A  i  ■  l  sin"1  i 
-1 

or  A  .  L  Bia  1 

This  function  varies  as  shoim  in  figure  4.    For  our  work,  however 
a  more  useful  parameter  is  what  is  sometimes  called  the  differential 
inductanoe       which  nay  be  defined  by 


so  that  in  our  case 

This  inductance  is  useful  when  we  have  an  equilibrium  current  qg 
and  are  considering  the  effect  of  small  variations  about  this  equi- 
librium.   Omitting  second  order  terms  the  system  will  be  equivalent 
to  one  with  constant  R,  L,  G  parameters,  the  inductance  being 
taken  as  L^.     The  variation  of  L-q  with  current  is  snown  in  figure  5. 


The  action  is  the  opposite  of  that  of  a  "swinging"  choke  where,  be- 
cause of  saturation,  the  differential  inductance  decreases  with 
large  currents. 

The  mechanical  idea  behind  the  operation  of  this  system 
is  quite  simple.    Suppose  shaft  e  to  be  turning  at  a  constant  rate. 
The  system  will  be  in  equilibrium  if  the  displacement  of  integrator  V 
is  such  as  to  make  its  output  feeding  into  the  adder  equal  and  op- 
posite to  e,  and  the  displacement  of  integrator  VI  at  zero.  Under 
these  conditions,  shaft  q  measures  the  rate  of  e  and  shaft  V,  the 
output  of  the  device,  the  arcsin  of  this  rate,     if  the  rates  are 
not  correct,  the  adder  changes  the  second  derivative  shaft  in 
such  a  direction  as  to  equalize  the  rates.    The  q  shaft  serves  as 
a  danper  to  prevent  continual  oscillation  aoout  the  equilibrium 
position. 
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MATHEMATICAL  THEORY  (Backlash  not  Present) 


Differential  Operation 

If  e  is  turning  at  a  constant  rate  and  the  system  is  at 
equilibrium,  and  then  a  small  differential  disturbance  is  applied 
to  the  system,  it  will  clearly  respond  very  nearly  like  an  R,  L, 
C,  circuit  with  constant  parameters,  the  inductance  used  being  the 
differential  inductance  for  the  equilibrium  current 


L 


y'i  -  41 


Such  a  system  has  a  tine  constant  of 


2  Leff 


2L 


T  x 


a 


tyl  -  q| 


It  is  critically  damped  if 


H2  -  4  Leff  S  ■ 


4L  S 


which,  of  course,  only  occurs  at 


16  i/ 


For  values  of  q  greater  in  absolute  value  than  this,  the  system  is 
oscillatory,  for  values  less,  over damped. 
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Proof  of  General  Stability  -with  Linear  e 

In  proving  the  stability  of  this  system,  I  have  used  a 
method  -which  may  be  new  in  some  respects.     It  was  suggested  by  the 
fact  that  in  a  non-dissipative  mecnanioal  system,  the  potential 
energy  U  is  a  minimum  at  a  point  where  the  system  is  differentially 
stable,  and  the  method  is,  in  a  sense,  a  generalization  of  that 
criterion.  It  is  not,  however,  limited  to  differential  stability, 
or  to  non-dissipacive  systems.     Since  the  method  may  be  of  use  in 
other  investigations  of  this  type,  I  will  first  describe  it  in 
general  terms. 

Suppose  we  have  a  differential  equation  system  in  which 
n  variables  and  derivatives  may  be  specified  independently  in  the 
initial  conditions.  7<e  will  say  that  the  system  is  stable  for  all 
initial  conditions  and  all  driving  functions  if  any  two  solutions 
of  the  system  with  the  same  driving  funoiions  approach  each  other 
in  the  sense  that 

Lim      2       \x±  -  y±\    -  o 
t  ->co  i  -  r 

where  xj^t),  x2(  t) . .  .x^t)  is  one  solution  and  yx(t)  ...yn(t)  the 
other.     If  this  limit  is  zero  for  certain  types  of  driving  functions, 
we  will  say  the  system  is  stable  for  these  functions. 
Thereomi     If  a  continuous  function  Q(x1...zn,  y1...yn,t)  can  be 
found  having  the  following  properties  ' 

X.     Q>0  for  all  x±,  yt,  t,  the  equality  holding  if  and 
only  if  x±  a  y±. 
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2,  dQ         at  all  times,  when  the  x^  and  y^  are  solutions 
of  the  system,  with  the  same  driving  function. 

3.  It  is  impossible  for  Q  to  remain  indef  initelj>A  ^  0. 
Then  the  system  is  completely  stable. 

For  the  function  Q  is  non- increasing  but  always^  0  and 
must  therefore  approach  a  limit  A>0  as  t  ~>oo  ,  but  by  5.  A^O 
is  impossible,  hence  A  =  0,  and  each  Ix^-y^/  — 5>0. 

Conversely,  it  oan  be  shown  that  if  only  a  single  forc- 
ing function  is  involved,  and  the  system  is  stable  for  this  funo- 
tion,  a  Q  exists  of  the  type  described. 

Roughly,  the  method  is  to  find  a  "distance"  or  "error" 
function  Q  between  two  solutions  which  is  zero  only  when  the  so- 
lutions are  identical  and  which  always  decreases. 

As  an  example  of  this  method  it  is  easy  to  prove  the 
complete  stability  of  the  ordinary  R,  L,  C,  circuit  with  constant  . 
parameters  without  solving  the  equation.    The    differential  equation 
is 

"  Sq  +  R$  +  L    q    =  e 
and  we  choose  q  and  \  as  coordinates.    Let  two  solutions  be  q1# 
q^and  q2,  q2«nd  consider  the  funoticn  Q  =    y  (qi-q2)2+  £  (qx-qg)  . 
Condition  1  is  obviously  satisfied.  How 

||-  SCqi-qgXqi-qg)   +    L(q^-q'2)  (aj-qg) 

-  -r  (ii-42)2£o 
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.    S  (n  -  At  -  3  .  EA)2 
S 

obviously  the  minimum  of  Q  with  respect  to  q  occurs  at 

At      B  -  SA 

q  -  s  +  s 


Also  •  a 

q  -  s 

ciQ  =  L  


y  1  -  q 

which  vanishes  only  for  q'f  It  is  readily  verified  that  this 
is  a  minimum,  and  that  (J  is  zero  at  this  point  for  any  t.  Now 


dt    oq  » 

i  -  s 


5S(q-4-|  +  §)0..4)>L 


S      S      3-  ~ 

1-q 


and 


Vl-q8 
q  s  ^ 


-  (At  t-  3  -  3  q  -  R  q) 


if  q  rjid  q  satisfy 

Sq  f  Bq  +       L       >  At  +-  B. 

V  1  -  q2 
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Hence 

d|  «  (Sq  -  At  -  B  f  J£)     (q  -  ±) 
~   (4  "  -f)Ut  +  3  -  Sq  -  Rq) 

■  -E  (q  -  |)2  *  0 

Note  that  this  rate  is  identical  with  that  found  in  the  linear  case. 
Incidentally,  it  was  by  working  baokward  from  this  rate  that  a 
suitable  function  Q  was  first  found. 

For  Q  to  approaoh  a  limit  K>0,  it  is  necessary  for  q 
to  approach  zero,  and  q  therefore,  to  approaoh  a  linear  function 
of  t  differing  by  a  constant  from  its  equilibrium  value.    But  from 
the  original  differential  equation  q  must  approach  a  oonstant  different 
from  zero,  which  contradicts  4^0.    This  does  not  however,  quite  com- 
plete the  stability  proof  due  to  a  certain  meohanical  peculiarity  of  the 
system.    Let  us  plot  the  equilevel  lines  of  Q  against  axes  X  *  (q  -  At 
-  |  and  Y  «  q.    (Figure  6). 


The  x  io  sin  x  gear  in  tne  ac-cuai  mecnanisn  has  a  limited 
movement,  and  is  prevented  f rem  going  too  far  by  e  slip  clutch  and 
stop.     If  '  q        Z    1,  the  stop  prevents  ;qj  from  increasing  anymore. 
The  original  equation  is  replaced  by 

• 

until  the  pressure  on  the  stop  reverses,     oo  far  we  have  snowi  that 
under  the  original  equation  Q  always  aecreases.    In  terms  of  our 
plot  this  means  that  if  we  start  a  solution  inside  the  curve  marked  C, 
the  solution  will  certainly  converge  to  the  equilibrium  position,  for 
the  solution  can  never  "escape"  from  C  and  hit  one  of  the  two  lines 
1  =  r  K,  where  the  differential  equation  changes.    ^7hen  we  are  not  on 
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one  of  these  lines  a  solution  will,  in  fact,  spiral  inward  in  the 
clockwise  sense,  as  maybe  seen  by  writing  the  differential  equation 
in  the  form 


(n  -    i*      B      3A,       R  As      _        L  a 


Consider  the  s  igns  of  5  and  (q-A/s)  in  the  four  quadrants  about  the 
equilibrium  position.     In  I  for  example  (q-A/S)  >  0  and  the  X  coordl- 
nate  of  a  solution  must  increase  with  tj  q  <  0  so  q  must  decrease, 
giving  a  clockwise  sense  to  the  notion.     Similarly  the  other  quadrants 
may  be  verified.    Some  of  the  solutions  starting  out3ide  of  C  will  hit  one  of 
the  lines,  but  the  solution  will  still  be  stable.     It  is  easy  to  show, 
by  a  study  of  the  signs  of  the  variables  and  their  rates  that  a  solu- 
tion can  only  hit  the  upper  line  to  the  left  of  the  point  with 
- 

coordinates  I  =  1  (|  -  £)  and  Y  .  K,  and  that  if  one  does,  it  will 
nove  along  the  lins  to  the  right  until  it  reaches  P-^  and  then  return 
to  the  original  equation.        similar  situation  holds  for  the  lower 
line.     If  we  should  start  a  solution  on  the  upper  line  to  the  right 
of  Pj  it  would  leave  the  line  immediately.    The  solution  is  always 
horizontal  (i.e.  q  ■  <))  on  tne  line  through  P^,  the  equilibrium 
point  and  Pg. 

If  R  ■  0  the  function  Q  is  constant  since  £S  ■  o  &nd 

dt 

therefore  the  solutions  of  the  equation 
Sq         L      q  ■  At  +  B 
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are"  the  equilevel  curves  in  Figure  6. 

I  have  attempted  in  several  different  -ways  to  generalize 
this  proof  for  arbitrary  input  functions  e(t),  but  so  far  have 
no  completely  rigorous  proof,     dowever,  some    of  the  arguments 
come  so  near  as  to  make  me  almost  certain  of  oomplete  stability. 
It  can  be  shown,  for  example,  that  two  different  solutions  with 
the  same  e(t>  cannot  definitely  divergei  i.e.    |qj>-q2|  f  |  |i-4g  \ 
cannot  become  and  remain  greater  than  some  positive  constant 
(assuming  e  and  e'  bounded).    Also  if  two  solutions  get  close 
together  (with  respect  to  both  q  and  q),  they  will  certainly  con- 
verge. 

The  Effect  of  Backlash 
— — — — _____ 

In  order  to  understand  how  backlash  can  cause  oscillation, 
let  us  first  consider  a  much  simplified  case.     Suppose  we  have  a 
second  order  linear  system  which  is  less  than  critically  danmed  with 
no  backlash  (Figure  7). 


Sq  -f-  R  4  +  Lq-e 
If,  at  t  "  0  we  suddenly  impress  e  -  E  (constant)  on  the  system 
(q  -  \  =  0),  the  response  is  a  damped  oscillation  (Figure  8). 
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Now  in  the  mechanical  system  there  are  only  two  rf  i 

oniy  two  driven  shales 

811(1  B»  and  backlash  only  affB(.+.  C  • 

or  thes       p  dirCCtly)  thS  °Pe^ion 

of  these.    ,robably  tne  gr 

^  18  W  the  adder  av«+o„ 

driving  shaft  A.    Let  us  assume  for 

assume  for  a  moment  that  this  is  the 
only  backlash  present  and  that  its  act. 

shaft.  18  "  f°ll0W8<  ™*» 

shaft  a  reverses  airection  ■  ( i.a    whfln      .  n/ 

U.e.  when  q  -  0)  there  i8  a  Bhor± 

—  -  *  ^s  w  h01d„        ~  ~" 

shaft  ■  ^  &S  MUUrfld  from  the  , 

^  Xt  18  that  the  response  of  the 

lash  i.  *h  SyStem  ^  bac^- 

lash  is  the  same  as  the  response  would  be  if  the 

lash  and  at  the  ti  -  "°  ^ 

^  ^  ^  '™  <™sly  Creasing  - 
aoout  to  increase)  we  turn  the  e  shaft  B 

.     w      f  8haft  "Bl  «ni  in  such  a  way 

8  ^  *  — ^ing  this  turning. 

snarly  at  the  nest  reversal  we  L±ve  .  . 
mcre,ent  Bj  keeping  J  constant  through  th- 
in n.v,  6         8  Peri°d  0f  °acklash. 
In  other  words,  the  res  onse  i8  that  ^ 

that  01  a  V-tea,  without  back- 
lash on  which  we  impress  as  f 

&    uxi0T;ion  a  wave  wnich  is 

aoout  as  shown  in  Figure  9. 
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If  the  periods  of  backlash  are  comparatively  short,  the  small 

connecting  portions  (actually  quadratic  polynomials  in  time) 

will  have  little  effect  on  the  response.     That  is,  we  can  assume 

a  square  topped  wave  with  little  error  in  $  or  q  especially,  due 

to  the  smoothing  operation  of  the  integrators  (or,  said  another 

way,  cue  to  the  high  impedance  of  the  circuit  to  ;a.gh  frequencies). 

How  suppose  that  there  is  a  certain  amount  of  backlash 

in  shaft  B.     The  action  of  this  is  to  cause  the  carriage  of  the 

upper  integrator  to  remain  stationary  for  a  small  period  when 
n 

q  I  0.     The  same  effect  would  be  achieved  if,  at  tnis  time,  we 
suddenly  impressed  on  e  a  pulse  wnich  held  the  lower  integrator 
at  fero  and  kept  changing  e  at  sucn  a  rate  as  to  keep  the  lower 
integrator  there.     lie  keep  the  integrator  at  zero  long  enough  so 
that  its  output  \70uld  have  turned  an  amount  equal  to  the  backlash 
in  B  and  then  suddenly  return  it  to  its  proper  value,     -his  means 
that  the  area  of  the  pulse  must  equal  the  backlash.     The  shape  of 
this  pulse  would  be  a  linear  function  of  tine,  but  here  again  it 
is  not  highly  significant. 

The  entire  system  may  thus  be. replaced  by  one  which  is 
free  of  backlash  and  subject  to  a- driving  function  of  the  type 
shown  in  Figure  10,  wnere  B±  is  the  backlash  in  A  as  measured 
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from  e  and  Bg  is  the  amount  in  B  as  measured  from  e  (in  the  sense 
that  if  e  covers  an  area  B2,  shaft  B  moves  an  amount  equal  to  itB 
backlash) . 

It  is  easy  to  see  from  our  diagram  that  this  forcing 
function  is  in  the  correct  phase  to  sustain  the  oscillation 
of  decay. 

Tne  fundamental  component  of  this  forcing  function  is 
easily  lound.     .Ye  have 


T 

Aj_  =  y        6  sin  — t^.  dt 

1 

o 

e  may  be  split  into  a  sum  -  one  term  for  the  square       wave  and 
oae  for  the  pulse-like  32  part.     The  i^2  pulse  is  all  concentrated 
near  the  center  of  the  sine  wave  where  it  is  nearly  unity.  Jfenoe 
approximately 

T 

AX  -  |     2      h.  sin  2*t  dt  4B2 

2  X  r|» 

^  o 

=  f-l    4  f  o  B2 

it 

The  period  T  of  this  oscillation  is  the  natural  damped  period 
of  the  system,  to  within  a  small  error  of  size  comparable  to  the 
length  of  tire  during  which  backlash  is  effective.    Hence  itw 
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frequency  is  approximately   

t  -  i  fi  T2 

and  the  magnitude  of  the  fundamental  component  of  the  response  q 
is 

2£i      4  f 0  B2 

I  .   

i  R2       (coqLd-  i  \Z 
"oc 

Providing  the  quantity  f!l     4  foB2  is  8111611 »  the  d*' 
flection  mechanism  will  behave  linearly  about  its  equilibrium 

position  and  the  above  formulae  would  approximately  hold.  If 

|qj    /    0  the  equilibrium  value  of  inductance       L  would 

/l^4q~ 

probably  be  as  good  as  any  to  use  since  the  differential  inductance 

is  greater  on  one  side  and  less  on  the  other.    At  4  -  0  the  inductance 

is  greater  on  each  side  and  a  somewhat  higher  value  should  be  used, 

depending  on  2B1       4f0B2»    If  tne  8ystem  is  more  tnan  critically 
if 

damped,  q  may  or  may  not  have  an  inflection  point  depending  on  the 
initial  conditions.    If  they  are  such  that  the  driven  shafts  do 
not  reverse  backlash  cannot  take  effect  and  there  should  be  no 
oscillation.    However,  if  they  do  reverse  once,  the  system  may 
receive  the  equivalent  of  a  "kick"  in  such  a  direction  as  to 
cause  another  reversal  and  so  on,  so  that  oscillation  is  set  up. 
ihis  problem  has  not  been  very  well  decided  but  if  this  happens, 
the  amplitude  formula  above  should  still  hold,  while  the  frequency 
formula  will  not. 
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The  question  of  "spring  backlash"  i.e.  undesired  effects 


due  to  elastic  deformations  of  shafts  and  mounting  plates  has  been 
raised.    Acoording  to  Hooke's  Law  the  angular  strain  in  a  shaft 

is  proportional  to  the  applied  torque.     This  torque  in  a  shaft 


the  first  term  wnose  si^n  is  that  of  -x1,  being  due  to  a  coulomb 
friction  load,  the  second  to  a  viscous  friction  load  and  the  third 
an  accelerating  torque. 

It  is  clear  that  the  coulomo  friction  term    I,  can  be 
combined  with  tie  ordinary  gear  type  backlasn  treated  above,  and 
acts,  therefor s,  like  a  periodic  forcing  function.     The  effect  of 
the  other  terms  is  ^uit.;  different,  their  presence  causes  small 
changes  in  the  parameters  and  6  of  the  circuit  and  also 

adds  higher  derivatives  to  the  equation.  Let  us  consider  only  the 
spring  in  the  shafts  feeding         L         q  (i.e.  assume  q  driven 


whose  position  is  x(t)  can  probably  be  very  well  approximated  by 


an  equation  of  the  form 


I  =  ±\  +■  2g  ac«   t  K3  x" 


(Sq  -  P1  q  -  Pz  q) 
(R  4    -  fx  q  -  ig  «') 


or 
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Sq   +  (R-Pi)  q 


'F2  -    *1.  1 


-    r2  V  =  (e-  «x  i  -  a2e)  -  eX(t) 


Spring  in  the  drive  to  q         a  similar  effeot  although 
complicated  by  the  non-circular  sine  gears. 

If  e  is  a  linear  function  of  t,  so  is  e^  and  the  forcing 
function  thus  contains  nothing  to  create  a  sustained  oscillation. 
The  left-hand  side  differs  only  by  small  quantities  from  the  ideal 
equation 

Sq   -  Sq   -  _Ji__        q  =  ex 

,  l-q> 

and  will  therefore  surely  approach  the  solution 


Thus  we  see  that  the  "spring  type"  of  backlash  cannot  cause  sus- 
tained oscillation  as  the  ;,gear"  type  of  backlash  can.  However, 
if  the  gear  type  is  present,  the  spring  type  can  aid  oscillation 
by  reducing  the  damping,     it  may  be  necessary  to  overdamp  in  some 
cases  in  order  to  get  an  effective  critical  damping. 

It  should  be  pointed  out  that  the  gear  type  of  backlash 

may  not  be  quite  as  simple  as  we  have  assumed,  particularly  in  the 
L  a 

shafts  driving       q 9     If  the  integrator  carriage  load  is  large 
aanpared  to  the  friction  loads  in  the  adders  and  gears,  then  we 
are  probably  justified  in  assuming  that  gear  pressures  in  the 
drive  only  reverse  when  the  driven  shaft  reverses,     however,  if 


this  is  not  the  case,  a  backlash  effect  can  easily  take  place  at 
other  times,  for  example  -when  one  of  the  shafts  feeding  the  adder 
reverses,  without  necessarily  reversing  the  driven  shaft  \ 


The  situation  could  become  quite  complicated,  the  equivalent  input 
function  containing  several  different  sized  steps  occurring  at 
different  times,    however,  the  fundamental  frequency  should  Btill 
be  approximately  the  natural  damped  frequency  of  the  system,  pro- 
viding the  backlash  effects  are  small  and  occur  only  during  a  small 
fraction  of  the  time. 

The  fact  that  backlash  can  cause  a  sustained  oscillation 
leads  to  a  cfitioism  of  the  design  of  the  mechanism,  in  particular 
to  the  metnod  whereby  the  ercsin  function  is  obtained.    Note  that 
reducing  the  amount  of  gear  backlash  4f 0B2  will  reduce  the 

amplitude  of  oscillation  proportionately,  but  apparently  the  only 
way  to  eliminate  it  completely  is  to  at  least  critically  damp 
the  system  for  all  equilibrium  points,  so  that  the  shafts  do  not, 
in  general,  reverse  direction.     In  the  deflection  mechanism  as 
it  stands,  this  would  be  distinctly  disadvantageous,  for  if  we 
critically  damp  at  the  maximum  values  of  jijj,  (the  governing 
points)  the  system  will  be  much  over-damped  near  Q  •  0,  and  in 
fact  for  most  values  of  4  due  to  tiie  shape  of  the  induct anoe 
curve. 

Another  related  argument  against  the  manner  of  getting 
the  arcsin  is  that  the  repponse  to  high  frequency  error  functions 
depends  on  the  value  of  q.     It  seems  to  me  that  the  treatment  of 
error  functions  should  be  independent  of  thet);arget  speed  - 
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what  is  best  for  one  will  be  best  for  another  -  since  the  predictlo: 
error  we  can  tolerate  is  an  absolute  quantity,  not  dependent  on  the 
target  speed.    There  may  be  some  objection  to  this  argument  on  the 
groundi  that  at  higher  target  speeds  the  error  funotion  is  apt  to 
be  larger,  and  hence  the  circuit  should  have  a  larger  impedance, 
but  even  so  it  would  only  be  accidental  if  the  peculiar  variation 
introduced  by  the  sinegear  was  anything  like  an  approximation  to 
the  desired  variation. 

Finally,  a  minor  argument  against  the  position  of  the 
sine  gear  is  that  the  equation  becomes  so  difficult  to  handle 
mathematically.    A  design  of  this  type  must  be  largely  intuitive 
or  experimental  -  there  is  not  much  chance  of  ohoosing  the  con- 
stants for  the  best  operation  by  a  mathematical  formulation,  or  of 
determining  to  speed  of  response  etc  analytically. 

These  difficulties  might  be  avoided  in  several  ways.  The 
arcsin  might,  for  example,  be  introduced  as  in  Figure  11. 


No  doubt  the  reason  this  was  not  done  was  because  -with  [  \{  near 
1,  running  the  sin  x  gear  backward  is  not  mechanically  practical, 
the  gearing  up  ratio  being  too  great.    This  objection  could  be 
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overcome  in  two  ways  -  either  a  new  gear  K  arcsin  x  to  x  (k  large) 
could  be  used  and  the  parameters  R,  L,  3  all  decreased  by  a  factor 
of  k  (or  the  integrator  disks  might  be  speeded  up  in  suitable 
ratios),  or,  if  this  were  not  mechanically  feasible,  a  rapid  re- 
sponse servo  mechanism  could  be  introduced  in  the  output,  Figure  12. 


This  system,  can,  by  the  way,  be  solved  in  closed  analytic  form 
when  i  is  a  constant,  and  reduced  tc  a  quadrature  in  any  case. 
The  essential  feature  of  this  circuit  is  that  the  functions  of 
rate  finding  and  smoothing,  and  of  taking  the  arcsin  have  oeen 
isolated.     ,ach  part  can  be  designed  to  do  its  own  job  the  best 
without  comoromise.     It  may  be  noted  that  the  arcsin  circuit 
aoove  also  performs  a  smoothing  operation  which  depends  on  target 
soeed.     Sy  suitable  choice  of  the  parameters  we  can  make  this 
larr;e  or  small  fs  T.-e  desire. 
The  ideal  Hate  Finder  aaa  Smoother 

Let  us  consider  the  problem  of  rate  finding  and  smooth- 
ing from  a  general  standoom^  and  as*  what  mathematical  opera- 
tion a  macnine  snould  perform  to  act  as  zhe  "best  possible*  rate 
finder.     Cf  course,  rni  s  question  has  many  answers,  depending 
chiefly  on  what  assumptions  we  make  as  to  the  input  function, 


3' 
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and  what  mathematical  limitations  we  put  on  the  machine.  Tile 
shall  assume  throughout  that  the  input  function  e(t)  consists  of 
a  series  of  linear  parts  with  cunrea  connecting  portions  and  with 
a  small  superimposed  error  function,  and  that  we  only  desire  the 
rate  during  (that  is,  some  time  after  the  start  of;  a  linear  part. 
In  this  section  we  assume  there  ar;  no  limitations  whatever  on  the 
machine  -  that  we  can  build  a  machine  tc  perform  any  operations  we 
can  ascribe,  in  particular  those  a  mathematician  might  use  tc 
solve  the  problem.    How  there  is  considerable  experimental  and 
theoretical  justification  to  the  t -eory  that  the  best  way  to  fit 
a  curve  of  a  biven  type  tc  a  set  of  points  subject  to  an  observa- 
tional error  is  in  the  least  square  sense.     If  we  assume  this  tc 
be  true  in  our  case,  and  attempt  tc  fit  e  straight  line  to  the 
last  a  seconds  before  tj  of  the  curve  e(tj,  we  must  minimize  the 
integral 

*l 

I  s  e  -  (At-B)     2  dt 

with  respect  to  A  and  B.    The  quantity  a  represents  the  length  of 
the  curve  used  in  the  fitting  process,    ne  would  like  to  use  as 
much  of  the  curve  as  actually  represents  a  linear  segment  to  get  the 
best  accuracy,  but  certainly  no  more.    A  person  doing  the  curve 
fitting  could  look  at  e(t)  and  see  fairly  well  where  the  curve 
showed  a  real  tendency  to  depart  from  linearity,  and  select  accor- 
dingly.   Mathematically  it  could  be  done  as  follows.    Suppose  the 
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standard  deviation  of  the  error  is   6  and  that  errors  of  more  than 
say  4cr  are  almost  certainly  due  to  a  significant  departure  from 
linearity  in  the  curve.    We  oould  choose  a  such  that  it  is  as  large 
as  possible  without  making  the  error  I  e-(At'B)  |      (A,  B  chosen  to 
minimize  I)  tj-a  £r  t  ^  greater  than  4<f.    In  other  words  we  use 
as  muoh  of  the  curve  as  we  can  assume  linear  within  observational 
errors.    As  a  final  refinement  of  the  solution  it  might  be  desirable 
to  include  a  weighting  function  W(a.t)  in  the  integral  I,  weighting 
the  more  recent  values  more  heavily.    The  final  evaluation  of  the 
rate  is  then  the  value  of  A  given  when  we  minimise  the  funotion 
ftl 

l(A,B.a)  8  re-(AttB)  J2  *(t,a)  dt 

u  t]_-a 

on  A  and  B,  a  fixed,  giving  A  and  B  as  functions  of  a,  and  then 
cnoose  a  as  large  as  possible  with 


|  e  -  (At+B)|  ±     K  C  tx  -  aftf 

This  solution  can  be  put  into  a  more  explicit  form, 
but  even  wnen  greatly  simplified  it  appears  that  it  would  be  quite 
difficult  to  carry  out  the  calculations  accurately  by  meohanioal 
means.     The  main  difficulty  is  that  apparently  such  a  machine  must 
be  caoable  of  remembering  exactly  the  past  history  of  an  arbitrary 
function,  e  or  something  derived  from  it.    The  only  methods  I  know 
Of  doing  this  are  quite  inaccurate,  or  else  very  complex,  and  it 
seems  likely  that  ^he  gain  in  mathematical  precision  of  the  above 


3% 


-  32  - 


formulation  -would  be  more  than  offset  by  a  loss  in  mechanical  pre- 
cision. 

Differential  Analyzer  Types  of  Machines 

Tc  become  a  bit  more  practical,  let  us  now  confine  our 
attention  to  machines  of  what,  might  be  called  the  differential 
analyzer  type.     3y  this,  vre  mean  machines  constructed  of  a  finite 
combination  of  adders,  integrators,  and  function  elements  (e.g. 
non-circular  gears).     Two  shafts  e(t>  and  kt  enter  the  machine 

- 

ana  ore  shaft  u(t)  leave  b  the  macnine.     It  can  be  shown  that  any 
such  system  must  satisfy  a  dif f erect ial  equation  of  the  type 

.     •  (n) 
*(q.q  ...  q     ,t)  =  e(t) 

with 

u(t)  a  qU). 

First,  we  ask  what  can  bo  said  about  the  form  of  this  equation  to 
maJce  the  machine  act  as  a  satisfactory  rate  finder  in  our  sense. 

1.  ..ith  the  same  initial  conditions  and  the  same  e(t)  the 
macnine  snoula  certainly  resDond  the  same  independent  of 
the  Time  of  start,     hence  f  does  not  depend  on  t. 

2.  .lien  e  =    At  B  the  equation  must  have  an  equilibrium  solution 

q^  ^  ■    A  q(*  ^)  =  o 

(i-D 

q  =  At  e  • 

t  i 

i  i 

t 
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If  i>l,  the  carriage  of  an  integrator  will  be  continuously  moving 
in  the  equilibrium  condition.     This  does  not  seem  practical  for  the 
initial  conditions  may  be  anything  depending  on  past  history,  and 
the  integrator  would  surely  go  off  scale  in  many  cases.  Obviously 
from  the  equilibrium  solution,  i  is  uot  G,  for  this  would  icply  a 
constant  equal  to  a  linear  function  of  time.     Hence  i  =  1  and 
q'  =  u(t). 

3.  Let 

f  U.y)  s  f  (x,y,0,  ...  0) 
jue  to  the  equilibrium  solution 

f  (At  -i-  C,  A)  =  At  -  3 

for  all  kt  J,  t. 

it  -  jH*.y)    A  -  A 

it         j  s. 

f  (x,y)  =  X  +  h  (y) 

"  tit 

4.  Assuming  f  is  fairly  "well  behaved",  we  have  near  q  »  q  =  ... 
■  q(n)  ■    p  (i.e.  near  equilibrium) 

f  ■  f  (q,  q,  0,  C,  ...   ,  0  ) 

q      *q  ^w 

■  q      h  (q)  *    a2  q^  ...      %  q 
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and  the  differential  operation  depends  on  the  coefficients 
&2  •••  a^and  h  (q).     As  this  differential  operation  should  not 
depend  on  t,  the  a^^  must  be  indepencent  of  q,  for  in  equilibrium 
q  cnanges  with  t.     Ihey  may  aepend  on  \  however  in  which  case  the 
differential  operation  depends  on  the  target  speed,  which  may  or 
may  not  be  desirable.     In  the  deflection  mechanism  this  is  the 
case,  ag  ■   1 

T-F" 

5.  iith  q  near  a  the  above  reduces  to 

f  •  q  f       q   —  a2q—  ...     —  a_  q(fl)-~  b 
where  a^  ■  h»  (a)  and  b  -  h(A}-Ah'(A).     To  eliminate  backlash  os- 
cillation the  roots  cf  this  equation  should  all  be  real  and  for 
stability  all  should  be  negative,  for  all  desired  A. 

6.  For  complete  stabil  ty,  there  are  no  doubt  further  requirements 
on  the.  form  cf  f.     This  problem,  however,  is  still  unsolved. 

The  above  are  only  requirements  on  the  form  of  f  so  that 
it  actually  does  find  a  satisfactory  rate.    To  find  the  best  form 
of  f  would  roquire  u.  very  elaborate  mathematical  analysis  if  possible 
at  all.  ■ 

If  we  restrict  our  machine  still  further  and  assume  a 
linear  differential  equation  with  cons-cant  coefficients,  it  is 
possible  to  ^ive  a  fairly  rational  analysis  leading  to  the  best 
values  of  the  coefficients.     The  question  is  this.    Given  the 
equation 
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»0  q      *i  q'         •••    »n  q(n)  ■  e 

What  values  of  the  coefficients  a0  ...  a^  give  the  best  rate- 
finding  smoothing  properties?    From  what  we  said  above,  it  seems 
that  the  characteristic  equation 


->  *n  P 


should  have  only  real  negative  roots  and  that  the  rate  found  will 
be  q'.    We  may  normalize  the  equation  by  assuming  a0  ■  1  so  that 
q*  is  actually  the  rate  and  not  merely  proportional  to  it.  In 
the  Heaviside  symbolio  notation,  we  have 


q'  = 


-V(V  1) 

writing  the  polynomial  in  the  factored  form.     The  b^  are  positive 
real  numbers  and  are  the  time  constants  in  the  transient  part  of 
the  response.    We  assume  the  b,  arranged  in  increasing  magnitude. 

Let  us  frsae  the  problem  as  follows.     Keeping  the  speed 
of  response  of  the  circuit  the  same,  what  values  of  the  b  give 
the  best  attenuation  of  the  error  function.    Of  course,  the  trouble 
appears  in  trying  tc  decide  what  we  mean  by  keeping  the  speed  of 
response  the  same,    ^'ne  answer  is  that  we  keep  the  maximum  time 
constant,  that  is  t_.  the  same.    This  may  be  partially  justified 
on  the  following  grc«ndsi    1.    For  "almost  all"  initial  conditions, 
the  term  A    e"-~  will  eventually  dominate  the  transient  response, 


24: 


-  oo 


the  other  terms  becoming  arbitrarily  small  in  comparison.  The 
only  time  when  this  fails  is  when  the  coefficient       happens  to 
come  out  zero. 

2.  In  the  worst  cases  (other  coefficients  small  in  comparison) 
the  bn  term  dominates  for  all  t,  and  the  machine  should  perhaps  be 
designed  with  the  worst  conditions  as  governing. 

3.  If  we  use  this  criterion,  it  is  easy  to  show  that  for  best  at- 
tenuation of  error  frequencies  all  the  b^  should  be  equal.  For 
the  magnitude  of  the  transfer  admittance  (e  to  q*)  is 

=   li  

2  2, 
V  (1-  bk      uj  ) 

which  is  obviously  smallest  when  each  bk  is  made  as  large  as 
possible,  for  all  frequencies.    That  is,  each  b^  ■  bn  the  maximum. 

Another  way  the  "same  speed  of  response"  might  be  in- 
terpreted is  in  terms  of  the  expected  area  under  the  transient 
time  curve.     Keeping  the  standard  deviation  of  this  area  con- 
stant seems  to  give  the  same  evaluation  of  the  bk  as  above  but 
there  are  certain  statistical  assumptions  in  my  proof  that  may 
render  it  invalid. 

If  the  characteristic  equation  has  real  roots,  it  may 
be  set  up  nicely  as  in  Figure  13. 

This  circuit  appears  to  have  an  advantage  from  the  backlash 
point  of  view  over  the  more  owvious  one  shown  in  Figure  14. 
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,      ^ver     that  the  use  of  nonlinear  equation. 
It  seems  quite  possible,  however. 

+otr«      Consider  the  equation 
could  offer  a  real  advantage. 

S(q)  q  +  Kfl>  4  S  * 

•    *.  are  functions  of        When  the  system 
where  the  three  coefficxent.  ere  fu 

<  +  acts  approximately  likex 
i.  at  equilibrxum.it  acts  a.  p 

3(0)  q    4-    K0)  q'    -  «  "  * 

be  adlusted  to  give  critical  aamp- 
^  these  three  constat,  could  beadj 

Man  of  the  error  function  frequencies.  On 
ing  and  a  good  attenuatxon  of  tw 

*  at  or    near  equilibrium,  q.  is 
the  other  hand,  when  we  are  not  at  or 

ki    different  from,  tero.    The  values  of  the 
(usually)  considerably  dxfferen* 

(usually;  w  to  g.ve  a  very 

three  coefficients  could  be  adjust 

,  thuB  .pproaoh  the  equilibrium  posxtion  faster, 
rapid  response,  and  thus  appro 

,      v^ver    that  there  is  some  fundamental  error  xn 
It  is  possible,  however,  tnax 

"w  *  .«  attempt  to  do  this  would 
-      *„*    for  example,  that  an  attempt  w 
this  reasonxng,  ror  exwny 

necessarily  cause  oscillation. 
r  irrJ-»  j^SSS:  ^cuits. 

 ^T^T-  — ...  —  -  —  -  - 
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A  HEIGHT  DATA  SMOOTHING  iIECH/iHI3M 
Claude  J2.  Shannon 
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A  HEIGHT  DATA  SMOOTHING  UECHANISa 


The  so hematic  diagram  of  a  new  type  of  height  data 
smoothing  me onanism  Is  shown  In  /igure  1.    The  discontinuous 
height  data  e(t)  Is  fed  into  the  input  shaft  at  intervals. 
This  drives  a  differential,  oonneoted  also  to  the  ball  car- 
riage and  roller  of  an  Integrator  whose  disk  is  turned  by  a 
constant  speed  motor.    A  correcting  hand  wheel  and  the  inte- 
grator roller  feed  another  differential  whose  output  is  the 
output  of  the  device.    The  output  and  input  of  the  machine  are 
compared  through  a  differential  feeding  dial.    The  operator 
is  supposed  to  turn  the  handwheel  In  suoh  a  way  that  the  posi- 
tive and  negative  oscillations  of  the  dial  about  zero  are 
equal. 

The  actual  height  of  the  target  h(t)  is  a  continuous 
function  of  time  and  we  may  assume  that  Just  after  each  read- 
ing e(t)  is  an  approximation  to  this*    Thus  h(t)  and  e(t)  might 
be  as  shown  in  Figure  2. 

The  shaft  y(t)  clearly  satisfies  the  equation 

(1)  7  ♦  £  7*  •  «(t)  . 
The  z  shaft  satisfies 

(2)  x(tJ  -  yit)  ♦  olt) 


and  the  dial  roads 


(3)  D(t)  -  e(t)  -  xUi  . 

During  the  period  between  height  readings  the  position  of  the 
alt)  shaft  is  constant,  aay  sit^),  the  reading  TiaJcen  at  ta, 

y  *;  y  -  9<V 

/  *  »  -a(  t  - 1_ )  <. 

y  -  ett^  +  ^  e  *       tn  -  t  v  tn  +  x 

Since  y  is  obviously  continuous,  it  will  follow  a  curve  con- 
sisting of  a  series  of  connected  exponentials,  each  with  the 
same  tine  constant,  1  •    The  continuity  of  the  ourre  implies 
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assuming  the  intervals  between  readings  the  same,  aay  a  seconds, 
the  response  y  for  two  different  time  constants  m^a  -  In  2  and 
aua  «  In  10  are  snovm  in  Jlgure  3. 

Hie  larger  the  time  constant,  the  acre  the  lag  in 
response  of  y(t),  but  the  smoother  the  curve,     Jhis  may  be 
aeon  another  way:    the  o  to  y  system  is  equivalent  to  an  3, 
L  circuit  with  position  of  3hafts  analogous  to  voltage  as  shown 


In  ifigure  4.    with  M  small  y  follows  e  closely  including  the 

a 

irregularities,    ./lth  <g  large  y(t)  is  smooth  compared  to  e  but 
lags  considerably. 

Movement  of  the  hand wheel  does  not  affeot  y(t)  but 
shifts  zltj  up  or  down  with  respect  to  y.    If  the  operator 
turns  the  uheel  to  give  equal  positive  and  negative  movements 
of  the  dial,  it  may  be  seen  that  in  the  "steady  state"  (say 
with  f(t)  -  at)  there  is  a  constant  lag  even  when  the  damping 
is  low  and  the  interpolation  nearly  linear.    In  this  case  the 
system  bridges  linearly  between  the  raid-ordinates  of  the  steps, 
while  actually  it  should  bridge  between  the  points  ( tn  ♦  0}. 
<ith  higher  damping  the  shape  becomes  worse  but  the  interpolated 
exponentials  are  nearer  to  the  true  curve  most  of  the  time.  *e 
3hall  find  a  formula  for  the  best  time  constant  of  the  system 
under  the  following  assumptions 

1.  That  the  "best"  time  constant  is  the  one  making  the 
actual  error  least  in  the  mean  square  sense. 

2.  That  we  may  take  as  the  true  curve,  so  far  as  our 
knowledge  goes,  the  linear  Interpolation  between 
the  points  tQ  +  0.    This  may  be  justified  by  the 
faot  that  the  device  cannot  in  any  way  perform 
higher  order  interpolation  -  the  curve  y(t)  is  con- 
vex upward  whenever  e(t)  inoreased  in  its  last  step 
over  the  final  value  of  y  from  the  preceding  step, 
and  this  is  quite  independent  of  the  curvature  of 
a(t). 


3.  That  the  system  is  In  a  "steady  state",  that  is, 
that  in  the  step  under  consideration  y(t)  ends  at 
the  aajaa  distance  below  e(t)  as  it  was  Just  before 
the  step. 

4.  riiat  the  steps  come  at  approximately  equal  inter- 
vals or  a  seconds. 

An  interval  under  these  conditions  is  shown  in 
Figure  5.    Here  we  assumed  that  the  hand  wheel  was  turned  to 
give  a  ratio  of  -2_  as  deflection  of  the  dial  just  after  to 
just  before  a  step. 

.v'e  have 
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y  -  A  e 


with 


ylo)  -  b  -  y(a) 
A  -  b  •  a  e" 


Hence 
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also 


l-e 


s  -  y  -  y(o)  +c 


-    1  -  <3"BA 

-  o  —  s—     +  c 

-am 
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The  Integral  of  the  squared  error  per  second  is  then 
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It  i3  evident  from  physical  considerations  that  the  minima  of 
this  expression  ooours  fop  a  fairly  large  D.    In  faot  the  error 
ourve  was  plotted  for  k  -  .5  (Figure  6)  and  the  alnUBaa  ia  seen 
to  be  at  about  7  or  8.    ,<ith  D  this  large  the  abOTe  expres- 
sion ia  very  nearly  equal  to 


-  7  - 


sinoe  e"D  is  very  small.    To  locate  the  minimum  we  have 


2*  -  jL  -  2D  (2  +  3k )  -  2  f ( 2  ♦  4k )  3  +  3]  .  Q 
D2      D3  4  D2 


16  -  8k)  D  -  16 
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whence 


3  -  4k 


7or     k  -  •* 
2 


D  -  8 


Since  the  m**Hw«»  is  so  flat  (Figure  6)  this  formula  is  cer- 
tainly close  enough.    However  a  second  approximation  may  he 

found  as  follows:    for  x  small  — - —  -  1  +  x.    Using  this  in 

1  -  x 

the  exaot  expression  to  eliminate  the  denominators  we  get  as  a 
second  approximation 


2e'D) 


-  tl*k)  U+e"D)  -  J5  llWD)  -  ±  (l*e-3)  e"3 


J 


-  a  - 


£5  -  0  «  -  8  ♦  (3- 4k)  D  +  [6D  (D*l)  *  2D3  lk-1)]  e~D+  6D  (D+l) 


Using  the  first  approximation  to  obtain  the  value s  involving 
exponentials,  a  better  value  may  be  obtained.    Jor  k  -  |  the 
second  approximation  ia  D  -  8.03.    The  first  and  second  approxi- 
mations are  plotted  in  Figure  7. 

tfith  k  -  -|  the  ourve  x<t)  is  plotted  for  an  interval 
with  the  "best"  D,  in  Figure  8.    It  will  be  noted  that  the 
ourve  is  highly  damped  in  comparison  to  the  time  between  read- 
ings.   The  HIE  error  is  then  equal  to 


It  is  interesting  to  oompare  this  with  the  HIE  errors  obtained 
under  other  conditions.    If  the  devise  is  not  used  at  all,  but 
a  direct  coupling  made  between  the  input  and  output,  the  HIE 
error  between  the  step  function  and  the  linear  interpolation 
between  points  tjj  +  0  is 


(I)2  .  1 
CS)  a 


t  2 

[0  -  (-  ^)     ]  dt 


I  m  1  m  .577 
b  "  y-sr  "  '  a 


so  that  the  RLE  error  has  been  reduced  to  40$  of  this  value. 

In  Figure  9,  the  output  of  the  smoothing  mechanism, 
x(t),  is  plotted  for  a  certain  forcing  function  e(t),  using 
the  "best"  value  of  m.    It  may  appear  that  the  output  1b  still 
far  from  3000th,  and  this  is  in  a  sense  true,  but  it  must  be 
remembered  that  the  variations  in  e(t)  are  here  greatly  ex- 
aggerated over  what  would  be  expected  in  practice. 

Finally  it  should  be  pointed  out  that  a  very  mater- 
ial improvement  in  operation  could  be  obtained  if  the  opera- 
tor were  trained  to  turn  the  handwneel  to  obtain  a  ratio  2 

b 

nearer  to  zero  than        This,  however,  would  probably  be  im- 

2 

practical. 
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SOME  EXPERIMENTAL  RESULTS 
OH  TEE  DEFLECTION  MECHANISM 

Claude  E.  Shannon 


June  26,  1941 


Some  Experimental  Results  on  the  Deflection  Mechanism 

In  a  previous  report,  "A  Study  of  the  Deflection  Mechanism  and  Some 
Results  on  Rate  Finders,"  a  mathematical  study  mis  made  of  a  new  type  of 
defleotion  mechanism.    The  present  paper  is  a  further  study  of  this  de- 
rice  and  a  report  on  same  experimental  results  obtained  on  the  M.I.T. 
differential  analyser. 

For  oonvenienoe  in  reference,  the  schematic  diagram  of  the  machine 
is  repeated  in  Fig.  1.    In  the  report  mentioned,  the  utility  of  the 
middle  part  of  the  device  -was  questioned.    This  arose  from  a  misunder- 
standing of  the  basic  assumptions  underlying  the  design  and  was  oleared 
up  in  a  conference  with  Dr.  Tappert.    The  writer's  analysis  was  under 
the  assumption  that  the  mechanism  was  designed  to  find  rates  for  linear 
forcing  functions  only  (i.e.,  that  higher  order  terms  were  small  by  com- 
parison) ,  and  the  analysis  is  still  valid  if  this  is  true.    However,  in 
practice,  it  appears  necessary  to  assume  higher  order  forcing  functions 
and  the  deflection  mechanism  is  designed  to  give  the  oorreot  steady  state 
rate  (exoept  for  the  non-linearity  of  the  sine  gear)  for  an  arbitrary 
quadratio  foroing  function.    Actually' the  middle  part  (often  referred  to 
hereafter  as  the  "x"  part)  of  the  devioe  is  certainly  well  worth  while, 
as  will  be  seen  from  some  of  our  experimental  curves. 

If  a  linear  mechanism  has  a  transfer  admittance  T(ja)  from  input 
e(t)  to  output  4(t)  then 

J"  Q(J«>)  -  T(»E(juj) 
where  E  and  Q  are  the  transforms  of  e  and  q.    It  is  easily  seen  from 
transform  theory  that  if  e(t)  »  at  ♦  b,  a  necessary  and  sufficient  condi- 
tion that  4(t)->a  a8  t-^>-  is  that 

«•>-»£  jo 

If  this  condition  is  satisfied  the  system  may  be  called  a  first  order 
rate  finder  —  after  the  transient  has  died  out,  the  output  is  the  deriva- 
tive of  the  input  whenever  latter  is  linear.    Similarly  if 

00 

T(O)  -  0        Y'(O)  -  j  T(0)  -  0        k  -  2,  5,  ...  ,  n 


we  have  an  nth  order  rata  finder  —  in  the  steady  state  it  finds  the  rate 
of  an  nth  degree  polynomial  forcing  function.    In  the  deflection  mechanism 
we  have  a  second  order  rate  finder 

 sj- 

-       +  e^w3  +  CgW*  ♦  ... 
if  we  assume  /      ■     nearly  1.    A  oircuit  for  solving 

A  ♦  42 

i  -  sin"1  4 

under  the  same  approximation,  to  the  nth  order  is  shown  in  Fig.  2.  The 
admittance  here  is  approximately 

1  #  a1(»  ♦  a2(»2  ♦  ...  +  Vl(j<u)n+1  ^ 
the  values  of  the  constants  in  the  mechanism  are 

1  »  4.63  J"» 


y(»  x  S  **oa  r  *  J" 

1  ♦  4.63  5.73  (j-r  ♦  1.094  (»S 

_  (1  ♦  4.63  .1«Qj«rf 

In  the  previous  report  it  was  pointed  out  that  due  to  a  clutch  and 
stop  on  the  input  to  the  sine  gear  values  of  q"  -were  limited  to  two  hori- 
zontal lines  (see  Pig.  6  in  that  report).    There  is  also  a  olutoh  and 
stop  on  the  displacement  of  the  lower  integrator.    This  effectively  fur- 
ther limits  solutions  to  a  parallelogram  ai  shown  in  Pig.  3.  Actually 
the  limitation  is  fictitious  —  the  q  shaft  oan  turn  an  unlimited  amount, 
but  when  this  stop  is  in  effect  the  stability  point  moves  at  such  a  speed 
as  to  be  equivalent  to  q  and  \  moving  along  one  side  of  the  parallelogram. 
Thus  if  we  keep  the  stable  point  stationary  paths  of  representative  solu- 
tions will  be  as  indioated  in  Pig.  3. 

The  trial  solutions  taken  on  the  differential  analyser  may  be  classi- 
fied as  follows « 
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I.    Solutions  taken  -with  the  mechanism  as  designed. 

A.  8imple  analytic  forcing  functions. 

1.  e(t)  -  a 

2.  e(t)  ■  at  t  b 

3.  e(t)  »  at   ♦  Vt  ♦  o 

4.  e(t)  -  at3  +  fct2  +  ot  ♦  d 

B.  Response  for  8  -typical  target  courses,  the  target  vector 
Telocity  constant. 

C.  The  response  to  some  error  functions  superposed  on  typical 
courses. 

D.  An  attempt  to  get  backlash  oscillation. 

II.    Approximately  the  come  program  although  less  extensively  with  the 
middle  part  eliminated* 
III.    A  few  runs  with  typioal  courses  using  three  different  third  order 
rate  finders. 

The  constants  of  the  target  courses  used  nere  as  follows  (see  Fig.  4) i 
Course  I  S    -  150  yds/seo  »  507  mi/hr 

O 

7  «  2,000  yds 
h^  -  1,000  yds 

$     m  0° 


Course  II        8    •  150  yds/seo 
g 


2,000  yd. 
h^  -  500  yds 


*  "0 

Course  III       8    -  150  yds/seo 
8 

V  -  4,000  yds 
ha  •  1,000  yds 

•  -  0 
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Course  IT 


S    -  150 

V    -  2,000 

h    -  2,000 
in 

0    -  0 


Course  Y 


Course  VI 


S    -  150 
S 

V  -  4,000 

in 

h    -  4,000 
in 

9    -  -  14.96° 

V  -  4,000  -  40  t 

S„  -  150 

V  -  2,000 
m 

h    -  1-000 

M 

*    -  -  14.96° 

V  -  2,000  -  40  t 


Course  VII 


B    -  96.6 

e 

V    -  3,000 


hn  -  1.000 
6  -  -  60° 
V  -  3,000  -  115  t 


Course  VIII  8-150 
g 

V  -  4,000 
hm  -  500 
•  •  0 


The  distribution  of  these  courses  is  indicated  in  Fig.  5,  together 
with  the  approximate  maximum  range  of  the  3B  A. A,  gun  (21  sec.  fuse  setting). 


The  actual  input  to  the  deflection  meohanism  is 

r*  s  h  t 

a       o  p 

but  since  it  was  desired  to  compare  the  actual  output  with  the  true 
deflection 

sin"1  i 

the  quantity  e  was  plotted  against  t  and  integrated  to  provide  the  input. 
To  calculate  I  the  following  method  was  found  to  be  the  simplest.    We  have 

8  h  t 

'  --P  **- 

o  p 


A  computation  schedule  was  set  up  based  on  this  formula,  working  baok- 

wards  from  the  time  of  burst  t  +  t    to  the  present  time 

P 

I  II  III 

(assumed) 

t  ♦  t  h  V 

P  P  p 

"  h/l*£8g(t*tp)J2         -  yi-  (ftp)Sgtan  *] 
IV  T  VI  VII 


*p  t  /  78— IT 


from  -  I  -  TV 

ballistic 
curves 


The  ballistic  data  used  in  getting  t    (IV)  was  read  from  the  chart 


Fig,  24  Opposite  p.  59),  Coast  Artillery  Field  Manual,  FM  4-110.  The 
value  of  tp  was  merely  read  off  corresponding  to  the  computed  values  of 

r    and  h  . 

P  P 

If  we  assume  as  an  approximation  that  the  shell  velocity  is  oonstant, 
k  yds/seo  (i.e.,  that  the  equi-time  of  flight  curves  in  the  ohart  are 
circles)  so  that  with  V  constant 


,  2.2      .2  „2 
k  t    «  h    +  V 

P  P 

h    -  h    +  S  (t+t  )  ' 
p       m       gv  p' 


p  m 


h/h"  ♦  S  t2 


we  oan  eliminate  tp  and  hp  from  the  system  to  obtain  the  following  equation 

between  e    and  tt 

o 


e2[k2(hm*Sgt)2(h^2)-  (h2*S^)V2S2] 


+  *[2  vsWhfVTt2]  -  C^5T2*TT2(h  *ts  )2]  -  o 

g  m  n    g    '      1    g  m     g  m*  m  g'J 

Evidently  the  same  curve  a  (t)  is  obtained  if  h    and  S    are  both  multi- 

o  m  g 

plied  by  the  same  constant. 

The  differential  analyeer  set-up  used  is  shown  in  Pig.  6.    An  attempt 
was  made  to  generate  the  sine  function  with  two  integrators  solving 

but  this  was  found  impractical  because  of  the  large  integrator  loading 
necessary,  and  an  input  table  was  used  instead.    Even  in  this  case  it  was 
necessary  to  use  a  very  large  scale  factor  on  the  independent  variable 
shaft  due  to  the  small  integrating  factors  (l/S2)  of  the  differential 
analyzer  as  nompared  to  the  ball  type  (about  1  under  comparable  condi- 
tions). ,This  resulted  in  solutions  which  represented,  actually,  30  sec- 
onds requiring  30  minutes  of  maohine  time. 

The  equations  of  the  deflection  mechanism  are 

9  i  *  .54  x  -  .54  | 

♦  4.700  q  ♦  1.692  q  -  1.692  e  ♦  4.700  x 

1 1-4 

It  was  neoessary  to  approximate  the  ooeffioients  with  available  gear 
ratios  on  the  differential  analyrer.    Fortunately  some  very  close  approxi- 
mations were  found.    The  equations  actually  set  on  the  machine  were 
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7t? 


*  ♦  .54 :X  -  .54  i 
♦  4.706  $  ♦  1.694  q  -  1.694  e  +  4.706  x 


The  error  is  of  the  sane  order  as  the  expected  machine  error. 

Except  for  runs  In  group  ID  the.  machine  was  made  as  "tight"  as  pos- 
sible, the  backlash  being  corrected  by  frontlash  units.    Due  to  the  large 
scale  factors  used  and  the  high  inherent  precision  of  the  integrators  used 
in  the  differential  analyeer,  the  rune  ray  be  expected  to  be  more  accurate 
than  the  actual  deflection  mechanism. 

Solutions  were  taken  in  the  form  of  both  curves  and  counter  readings. 
The  ourves  given  here  -were  reproduced  by  pantograph  to  ordinary  graph 
paper  size.    Curves  not  directly  drawn  by  the  machine  and  numerioal  values 
quoted  are  taken  from  the  counter  printings,  which  give  an  additional 
decimal  plaoe  not  readable  from  the  ourves. 

Discussion  of  Runs 

Host  of  the  curves  are  given  with  4  as  dependent  variable.    To  esti- 
mate the  error  in  yards  for  a  given  error  in  q  from  e,  the  ohart  of  Fig,  6A 
may  be  used.    This  is  computed  from  the  approximate  formula 

r  cos  t  IS 

.  r££L*  Aq  -  r  A(e,q)  Aq 
/l-F 

For  rough  comparisons  the  coefficient  A  may  be  taken  as  1,  the  error  then 
being  the  4  error  multiplied  by  the  predicted  range. 

The  first  set  of  runs  taken  were  with  a  sudden  impulse  e  -  kl  with 
the  system  at  rest,  both  with  and  without  the  middle  part  of  the  meohanism. 
Runs  were  taken  with 

k  -  0.1,  0.2,  0.4,  1.0,  2.0 

Typloal  curves  are  shown  in  Figs.  7  and  8.  The  results  are  very  close  to 
computed  ourves  on  the  assumption  that  l/f/l*^  ■  1  when  k  <  .4,  but  above 
this  the  non-linearity  becomes  appreciable.  In  the  worst  cases  the 
sient  disappeared  to  within  machine  errors  in  25  seconds,  and  for  most 
oases  within  8  to  12  seconds.    The  action  with  the  middle  part  out  was 
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considerably  more  rapid  than  -with  it  in,  the  transient  being  6  tines  as 
great,  as  had  been  predicted,  this  being  a  special  case  of  a  linear 
forcing  function.    Pig.  9  is  a  -lot  of  the  time  required  for  the  transient 
in  4  to  reduce  to  2/10  of  its  maximum  value.    For  values  of  k  greater 
than  about  .35  the  curves  cross  the  axis  once  with  the  middle  part  in. 
The  curves  with  it  out  are  all" identical  with  k  >  2,  due  to  the  action 
of  the  slip  clutch  on  one  integrator. 

- 

Next  a  series  of  runs  were  taken 

e  -  ktl(t) 

starting  from  rest,  with 

sin""T:  -  steady  state  S  -  15°,  30°,  45°,  60°,  75°,  60. G° 

the  last  being  the  limit  of  the  sine  gear,  the  maximum  possible  deflection. 
These  runs  are  shown  in  Figs.  10  and  11.    The  transient  died  out  in  all 
cases  within  20  seconds  except  with  x  in  for  S  >  75°  in  which  oases  30 
seoonds  or  more  was  required,  due  to  the  action  of  the  slip  clutch.  These 
long  transients,  however,  would  probably  not  be  troublesome  since  such 
large  deflections  would  only  ocour  in  practice  with  the  plane  almost  di- 
rectly overhead.    For  the  smaller  values  the  response  is  about  equally 
rapid  with  x  in  or  out. 

Quadratl o  Forolng  Functions 
— — — —  1 

The  runs  with  a  quadratic  forcing  function 

e  -  at2 

were  the  first  to  show  the  superiority  of  the  mechanism  with  x  in.  Runs 
were  taken  with 

a  -  .01,   .02,  .03,   .04,  .10 

With  a  quadratic  rate  finder  the  solution  q"  should  approach  2  at,  and  with 
x  in  this  was  very  nearly  true,  the  discrepancy  being  due  to  the  sine  gear. 
8ome  solutions  are  shown  in  Figs.  12,  13,  and  14.    The  errors  increase  with 
a  and  with  \.  The  maximum  slope  found  in  air/  of  the  I  courses  plotted  is 
about  equivalent  to  an  a  of  .05  so  that  the  large  errors  due  to  the  sine- 
gear  with  a  -  .10  need  not  cause  great  concern. 
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Cubio  Forcing  Functl ong 

For  oubic  forcing  functions  the  following  were  used 

•±  -  -.04  t3  ♦  .1  t2 

e2  -  -.001  t3  ♦  ,05  t2 

e3  -  -.0002  t3  ♦  .02  t2 

.These  -were  chosen  as  having  second  order  tangenoy  at  t  -  0  so  that  the 
transient  is  small.    The  results  are  shown  in  Figs.  15  and  16.    The  re- 
sponse with  e2  and  especially  e3  are  very  olose  to  the  calculated  values 
on  assuming  the  equation  linear.    The  error  in  e^  is  somewhat  greater  as 
in  the  quadratic  case  with  higher  acceleration. 

Effect  of  Backlash 
— — —  —  ' 

A  number  of  runs  were  made  to  determine  the  effect  of  backlash  using 
several  different  foroing  functions.    In  order  to  inorease  the  amount  of 
backlash,  frontlash  units  were  inserted  at  several  oritioal  points  in  the 
baokwards  direction.    The  results  of  these  runs  were,  however,  oompletely 
negative,  for  no  oscillation  of  any  sort  was  discovered.    The  system  was 
given  "shocks"  by  sudden  turning  of  the  e  shaft  and  other  methods,  but  the 
solutions  were  oompletely  stable    The  only  results  were  small  consistent 
errors,  of  the  order  of  magnitude  of  the  backlash.    It  is  possible  that 
due  to  the  large  soale  factors  used  in  the  set  up,  even  the  artifiofelly 
introduced  baoklash  was  not  sufficient  to  oause  the  oseillatlon  effect. 

Response  for  Typical  Courses 

The  response  for  the  8  oourses  described  above  are  shown  in  Figs.  17 
to  24.    It  may  be  noted  that  even  on  the  flat  oourses  (e.g.,  IV)  the  opera- 
tion is  poor  without  x.    On  the  flat  oourses  the  response  is  satisfactory 
with  x,  the  error  being  less  than  20  yards  except  sometimes  at  the  hump  in 
e.    However  for  the  steeper  courses  errors  of  60  or  more  yards  are  common 
after  the  start  of  the  peak  which  do  not  disappear  until  nearly  the  end  of 
the  oourse.    The  action  is  particularly  bad  coming  down  the  hump.    Fig.  25 
is  a  plot  of  the  error  in  yards  with  oourse  VIII,  x  in. 
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Response  to  Error  Functions 

In  Pigs.  26  -  28  are  shown  the  responses  to  some  random  error  func- 
tions of  various  kinds  superimposed  on  courses  I  and  II.    The  operation 
in  damping  out  the  error  is  considerably  better  with  x  out.    However  it 
seems  from  a  consideration  of  the  size  of  the  errors  introduced  and  the 
responses  found  that  the  system,  even  with  x  in,  damps  the  errors  more 
than  necessary.    That  is,  it  might  be  preferable  to  increase  the  speed  of 
response  so  as  to  reduce  the  transient  errors  in  the  solutions. 

Pigs.  29  and  30  show  the  responses  when  we  suddenly  start  tracking  a 
target  in  courses  I  or  II  with  the  machine  previously  at  rest,  with  the 
target  at  several  points  along  the  course. 

Tests  with  Different  Equations 

Three  runs  were  made  on  course  VIII,  the  most  difficult  one  of  the  : 
group,  using  three  different  cubic  rate  finding  equations.    The  equations 
used  were  (assuming  linearity)  critically  damped,  with  the  transfer 
admittance st 


[i  ♦  2(>)r 

2 


(2)  4  .  1  *  4(j«fr  ♦  6(J.) 


[i  ♦  (J-)]4 


The  results  of  these  runs  are  shown  in  Pigs.  31,  32,  and  33  and 
should  be  compared  with  Pig.  24.    Of  oourse,  this  gain  is  accompanied  with  . 
a  loss  in  error  function  damping.    With  the^roots  equal  to  2  the  system 
had  a  slight  tendency  to  be  unstable  on  the  flat  part  of  the  oourse.  This 
however  appeared  to  be  due  to  the  "human  backlash"  in  the  operator  on  the 
sine  table  and  would  probably  not  be  present  with  a  sine  gear. 

It  is  easily  seen  that  an  increase  in  the  values  of  the  characteristic 
roots  of  the  equation  demands  a  proportional  increase  in  the  power  require- 
ments of  the  integrators.    It  may  be  that  this  will  be  a  design  limit  in 
the  case  of  meohanioal  systems.    Ho  difficulty  would  be  experienced  here 
however  with  electrical  integrators. 
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The  main  conclusions  of  this  work  are  as  follows: 

1.  The  middle  part  of  the  machine  is  definitely  worth  while. 
Although  it  increases  response  for  accidental  following  errors,  the  gain 
in  behavior  for  actual  courses  more  than  offsets  this  disadvantage. 

2.  The  system  behaves  nearly  enough  like  the  linear  system 


1.094  "q  ♦  5.73  q  ♦  4.63  q  ♦  q  -  4.63  I  *  4.63  e 


to  within  a  few  per  cent, 
ction  of  37°,  the  approxi- 


that  this  may  be  used  to  calculate  its 
providing  q  <  .6.    As  this  corresponds  to  a 
mation  is  sufficient  for  most  eases. 

3.  For  targets  whose  elevation  at  their  nearest  point  is  greater  than 
about  50°  fairly  large  errors  occur  due  to  substantial  cubic  and  higher 
degree  terms  in  e.    This  indioates  that  it  might  be  worth  while  to  use  a 
higher  order  rate  finder.    Tests  made  with  a  oubio  rate  finder  showed 
greatly  improved  results. 

4.  If  the  additional  cost  of  another  integrator  and  adder  required 
for  cubic  rate  finding  iB  too  great  to  be  Justified  it  appears  that  the 
system  oould  be  improved  by  reduoing  the  time  constants,  for  if  sufficient 
power  is  available  from  the  integrators,  the  only  disadvantage  would  be 
increased  response  to  random  error  functions  and  our  results  indioate  that 
they  are  now  damped  out  more  than  neoessary. 

5.  There  is  some  indioation  that  better  results  would  be  obtained 
by  making  the  three  time  constants  equal,  or  more  nearly  equal  than  they 
are  now,  although  this  is  not  certain. 
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Criteria  for  CcnaUtecoy  and  uniquenee*  la  R«lay  circuit! 
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September  ft,  1M1 

Zb  ft  ayatea  of  linear  algebraic  equation*,  thara 
ara  tfcree  poaaibla  type*  of  de«eu*rnoy,  n&aely  lneonaiateaey 
(no  poaaibla  aolntioa),  assblguity  (solution*  not  uniquely 
determined)  and  redundancy  (aura  equation*  than  neeeeaarr)  • 
Scoe**ary  and  auffioiont  condition*  ara  known  for  the  a* 
types  of  degeneracy  in  tcra*  of  the  rank*  of  mm  coefficient 
and  augmented  satrioea.    Soaewfcat  elailar  af facta  can  occur 
in  tna  boolean  equation*  characterising  relay  oircuita,  gir» 
ins  riaa  respectively  to  chattering  aaoiguity  of  relay  pool- 
tioa  for  certain  value  of  the  independent  variable a,  and  reduad- 

^UaVCJJ^   ^Je?^    HJ^avdsVj^JJ   ^^e^?   ^M9&aat'^^^aV^jtfca^^       3ha^fc    ^*1b^*J**^J  e^H^c*1^*^    Jpas\J?^fce^ca^^n>    ^H^^L^fc^Ht^^LJfc  ^cTiij^^^a, 

W«  aattM  i  aihmA   fjM»  thft«>  mnnA  I  tlrtna   Im  t— mm  f»f  a  a  ilMKltt 

ae^a?    ^s*es> ^*^acaa»>ea>*aaa^pa»    *>  wcT    Waaler  i*^i*»   ^p^peiwn  ek  vavatv    aa^ai    w^ses, a*  ^e^w^a  w 

dlacrlainant  7. 

Consider  a  relay  circuit  containing  •**  relay  a 
*X>  «gf  ••••         Hake  and  break  a  oat  cot  a  oa  ^  are  dealg- 
aated  aA  aad  *J,  and  we  auppoca  that  thara  are  a  independent 
variable  a1,  e^,  •»•,  e^,  which  do  not  depend  oa  the  relay 
poaitlona.    0uah  a  circuit  la  equivalent  to  the  circuit  of 
Fi*.  1  in  which 

*i  *B*  ****  ***    *i»  *#,•  ••• 

la  the  Boolean  function  which  la  aero  when  the  awitchee 


*»ft  MitMti  a^,  ere  la  eucfc  position*  that  the  volt- 

«M  wro»      la  the  original  circuit  la  *uf r icloot  to  oper- 
ete  It  ana  oh  otherwise.     The  fenetloa 

B 

i-x 

will  be  •till*  the  oirauit  ai«cri*ta*nt.   *e  alee  define  the 
following  it  mm*   a  eteadr  etate  la  a  relay  circuit  corres- 
ponding to  a  given  aat  of  veluee  of  the  laaepeaaoat  variables 

Ais  a  act  of  poaltloaa  P..  ?«.  JLrtao 
relaye  oath  that  If  tao  iadepeodeat  variabice  ere  given 
tao  valuee  A^,  end  tao  ralaye  held  la  tao  position 

Tt>  ««»•  Pa  lea*  enough  for  tao  eteadr  atato  fluxee  la  tao 
00U0  to  build  *»,  the  relays  will  remain  la  tao  aaao  poal- 
tloaa ladefinBtely, 

a  oeapletelr  •oolUatoay  oteto  at  a  relay  elreult 
la  a  aot  of  valaoa  Mg%  A,,  „#f      of  the  independent  variables, 
each  that  ao  natter  what  tao  Initial  yoaltloae  of  tao  relays, 
or  how  long  they  are  held  la  that  position,  ansa  they  ara  re- 
leesed  at  least  oao  aakeo  aa  laflalto  auaeer  of  eeoUlatloas, 
I.e.  ehattare.    Xa  addition  to  theee  obviously  exclusive  pocei- 
hUitles  a  alrealt  nay  be  •partially*  oscillatory  for  eertela 


Y*lu*i  of  th«  loft«j>emaoftt  rarioblos-  with  mm  iaitUl  oonCi 
tiooo  th«  •Ircuit  oh&tt«r*  and  with  otters  roiftpooo  ioto  o 
•toot?  ototo.  Ao  oxonpla  U  oho**  im  Figure  a  wtero  with 
too  ioltiol  OOO&MOO 

ax  •  0  (o9»i»to4) 

tho  oireuit  «h*ttero  while  with 

tho  oireuit  rei&peee  into  tte  eteefijr  ototo      •  1,  Rg  *  1 

fttSBBI  I  •  *°*  *i§  *••*         *£•       *M  t*  »e  o  otooA/ 
ototo  It  is  oeeeoeerjr  eoft  ouffloleot  toot 

This  lo  aeoeoeejy  eiooe  lo  o  otoo^jr  ototo  too  oeotooto  of 

■  ■ 

relay  «1o41o#i 
or 

%•.%•» 

to  toot 

o-ai^ol^-t  «*eo      •  Wv  mt  •  A^ 


Xt  la  sufficient  sines 

so  tt*t  if  tii*  relays  are  hsld  is  these  positions  ?A  long 

enough  fear  fluxes  to  build  up  they  will  remain  there* 

■ 

Theorem  II  •  For        ....      to  be  completely  oscillatory 
it  is  necessary  end  sufficient  that 

t  C*^t         a^i  «^»  ••••  a^)  •  l 

identically  la  the         This  la  accessary  sines  other- 
wiss  there  Is  a  sst  of  a^,  say  9^  such  that  *  *  0  and 
this  Is  a  steady  stats  by  Theorsm  X,    It  la  sufficient 
alas*  If  true  thsa  with  any  starting  position  say 
9V  •»*,  Fa  at  least  one  tern  of  ths  sua  (1)  say  *t  •  n^ 
la  equal  to  one.  aa  that 

snd  one  or  ths  other  ana  to  •  hence,   After  sons  relay  has 
shangsa  «a  still  boys  ths  sans  aitaatloa  sines  f  -  1  so 
that  at  lsaat  one  relay  ashes  aa  infinite  number  af  shannon 
of  position* 
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la  »tM  f  Ui#         A^t         #♦♦»  a^)  is  *  function 
•f  tfat      (ait  idontioalir  ©at  or  n«ro)  too  oyste*  h»» 
•om  nt«aay  »tata«  aawoly  tat  roots  of  f  «  0,  Out  for 
arbitrary  starting  conditions  w*  saenot  toy  what  the  notion 
will  so,   Khataer  s  elroalt  eeefce  out  s  steady  state  or  sot 
depends  set  only  on  ths  artwork  topologr  so  la  Fig,  2»  oat 
•loo  oa  relay  ehareoteristise  as  la  Fig.  3.  Bare  If  lo 
olow  operating  ana  *j  wy  fast  the  « iron  it  oar  chatter 
with  both  relays  ialtieUy  uaeps rated  for  ag  nay  new 
stay  la  long  eaoasfe  to  opsrsto  K^.    If      lo  fast  and 
Sg  alow  release*  too  systea  rolapooo  lata  *x  *  0,  Rg  •  1. 
Boaoo  no  purely  slgsbrais  oo  editions  saa  So  sot  ap  to  deter- 
alao  whether  a  olroait  will  rolapao  lata  a  stood?  otota  whoa 
0  la  a  function  of  s^t 

©  ojk  ^fts^  eiSKe^sKJo^SPf 


! 


SvlIj  15,  1943 


Gap?  Ko 


Bel 
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BALLISTIC  EQUATIONS  ON  THE  ABERDEEN  ANALYZER 

by 
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Dp,  0»  E»  Shannon  of  the  Bell  Telephone  Laboratories 


AMP  REPORT  NO.  28.1 


APPLIED  MATHEMATICS  PANEL 
NATIONAL  DEFENSE  RESEARCE  COMMITTEE 


This  is  a  report  on  Investigations  made  at  the  request 
of  Dp.  Warren  Weaver  (letter  of  December  28,  1942).    Our  study 
has  been  based  partly  on  oral  information  received  in  Aberdeen 
(January  18,  1942)  and  partly  on  the  material  contained  in  the 
Report  No.  319  of  the  Ballistic  Research  Laboratory  ("Report 
on  the  Differential  Analyzer  at  Aberdeen  Proving  Ground"  by 
Major  A.  A.  Bennett,  December  1942).    The  technical  set-up 
as  described  in  that  report  will  in  the  sequel  be  referred  to 
as  "present  set-up".    It  should  be  clearly  understood  that  we 
were  not  to  study  possible  technical  improvements  of  the  ana- 
lyzer as  such  nor  to  reexamine  the  theory  underlying  the  dif- 
ferential equations.    Accordingly,  the  present  report  is  con- 
cerned only  with  an  examination  of  the  procedure  of  mechanical 
integration  of  the  differential  equations  of  ballistics  as 
used  at  present.    Furthermore,  we  have  not  considered  any  methods 
of  integration  other  than  on  the  differential  analyzer. 

Before  proceeding  to  describe  devices  which  might 
contribute  to  the  efficiency  of  the  analyser  we  wish  to  summarize 
some  negative  findings,  as  these  may  render  superfluous  similar 
investigations  by  other  persons. 

a)    We  have  carefully  investigated  a  great  number  of 
alternative  set-ups,  on  the  differential  analyzer,  of  the  dif- 
ferential equations  either  in  their  present  form  or  using 
various  new  variables.    However,  we  have  been  unable  to  find 
any  form  superior  to  the  method  as  used  at  present  in  Aberdeen 


which,  in  our  opinion,  is  the  most  efficient  one. 

b)  We  have  studied  the  advisability  of  using  some 
method  of  successive  approximations.    Such  methods  naturally 
present  themselves  since  one  should  expect  them  to  reduce  the 
ranges  of  the  variables  involved  and  thus  increase  the  accuracy o 
However,  a  closer  study  will  show  that  it  is  almost  invariably 
necessary  to  subtract,  on  the  analyzer,  two  large  quantities 
which  are  themselves  independently  obtained  on  the  analyser. 
This,  of  course,  nullifies  the  desired  effect  of  reducing  the 
ranges.    Various  possibilities  have  been  studied  and,  among 
fchesn,  the  possibility  of  starting  with  the  vacuum  trajectories 
and  integrating  the  difference  between  them  and  the  actual 
trajectories.    Again  we  were  unable  to  find  a  method  which 
would  aopear  superior  to  the  present  set-up.    It  will  be  noted, 
however,  that  the  modification  of  the  latter  suggested  below, 
can  in  some  sense  be  interpreted  as  the  first  step  in  method 

of  successive  approximations. 

c)  Several  perturbation  methods  and  expansions 
according  to  various  parameters  have  been  tried  paying  special 
attention  to  methods  suggested  in  the  newest  Russian  literature . 
None  of  these  methods  seem  appropriate  for  the  analyzer « 

Coming  to  the  less  negative  part  of  this  report  we 
remark  that  an  adequate  theory  of  errors  of  the  differential 
analyzer  is  not  available  at  present.    However,  simple  theoretical 
considerations  based  on  experience  gathered  at  M.I.T.  make  it 
appear  that  a  very  considerable  part  of  the  total  error  is  due 
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of  error  are  backlash  and,,  perhaps  even  bo?®,  inaccuracies  in 
the  following  meehenism  for-  the  input  and  vector  tables .  It 
ssems  therefore  possible  to  achieve  a  gain  in  accuracy  by  P®« 

dueing  the  range  o£'  the  variable??  in  the  integrators,  even 
though  this  nay  neeossitat©  the  introduction  of  new  adders 
and  gears.    $hs  following  r ecomsaendat ions  are  based  on  this 
assusaptiO'At    We  proceed*  step  by  step  starting  with  the  simplest 
case. 


Recomend&tions , 


1)    Consider,  to  begin  with,  the  horizontal  displace- 
to 


s 

sent  2:.    Obviously    dx/dt    will  range  from  its  maximum    r,  at 


the  beginning  to  seine  fraction  of  it,  say  qxQ,  at  the  end* 
Accordingly,  when  integrating  in  the  usual  form 


(1)  X     *  X  dt 


the  integrand  ranges  from    qzc    to    xQ  ,    Now  this  means  that 
only  a  fraction    1  "  -3 —     of  the  total  range  of  the  integrator 
disc  is  used  even  if  we  suppose  that  the  goale  factor  has  been 
chosen  in  the  best  way  (30  that  the  rim  of  the  integrator  disc 
is  used  for  values  of    x    near    x0).    If,  instead,  we 

14J_  i            f  *      1  .  <l 
(2)         x  -  — g r  xot    «  j(z  .  i-|   a^Jdt  , 


1  —  Q  " 

the  Integrand  will  range  from  its  maximum   — *o    t0  lta 


minimum 


-  1  -  a  i 

2  o 


This  allows  one  to  use  a  scale  factor 


■s  r    times  as  large  as  in  the  set-up  (1)  and  to  utilize 

1  -  q 

the  entire  integrator  disc.    This,  of  course,  means  a  consider- 


able gain. 


Eow  the  constant 


i  ±  q 


in  the  integral  in  (2) 


appears  only  as  an  Initial  displacement.    It  is  therefore  seen 
that  the  realization  of  the  proposed  set-up  (2)  requires,  as 
compared  with  the  customary  set-up  (l),  an  additional  gear  (to 
produce    1  t  q  aLt  )  and  an  adder.    The  following  figure  shows 

the  simplest  mechanization. 

 >\ 


s 

x 


14-Q  . 

x  -      2  x0t 


t 
t 


It  goes  without  saying  that  the  gear  ratio  does  not  need  to 


be  exactly 


I. +.  .3  4 


2 


xQ    •    any  number  near  the  middle  of  the  range 


of  the  integrand  will  do  the  same  services • 

If  used  to  its  fullest  extent,  the  system  as  described 
changes  a  previously  positive  variable  into  one  taking  on  also 
negative  values.    Although  only  one  change  of  sign  is  introduced 
this  will  introduce  some  new  backlash*     Now,  if  instead  of  (2) 
we  mechanize 


(S) 


x  -  qx.t 


qxQ)  dt, 


T 
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the  new  integrand  does  not  change  sign,  and  no  new  backlash  is 

introduced.    On  the  other  hand,  the  optimum  scale  factor  for 

(3)  is  only        —    times  that  for  (l),  that  is  to  say  half  the 
1  -  q. 

scale  factor  for  (2).    We  conclude  that  with  proper  corrections 
for  backlash  the  set-up  (2)  should  prove  besto    However,  if 
enough  frontlash  units  are  not  available  at  Aberdeen,  the  set- 
up  (3)  may  be  tried  with  advantage. 

2)  A  similar  device  can  obviously  be  used  wherever 
the  range  of  the  integrand  does  not  utilize  the  integrator 
disc  to  its  fullest  extent*    This  is  true  for  almost  all 
integrators  whose  outputs  are: 

(i)    the  horizontal  displacement  x, 
(ii)    s    =     fv  dt  ,  v  being  the  speed, 
(iii)       Q"hj  ,  where    y    is  the  height* 

In  the  first  two  cases  the  new  set-up  would  not  produce  any 
additional  loading  since  the  integrators  are  driven  by  the 
independent  variable-motor.    In  other  cases  an  additional 
loading  would  ensue  which  may  have  to  be  compensated  by  the 
uae  of  a  larger  scale  factor  on  the  t-shaft;  this  would  in- 
directly slow  down  the  machine.    Whether  this  will  have  to  be 
done  is  impossible  to  predict  theoretically.    Should  it  prove 
necessary,  it  would  be  for  the  user  to  decide  whether  the  gain 
in  accuracy  is  worth  the  loss  in  speed. 

3)  If  the  above  described  device  should  prove  in- 


-  V/  -- 


*  -       v  j 


?'are  &i#£iuZ£  fit  cbs  atpens*  or 


f  ©Hewing  uspr-c-vftmca?  &*t 
oonaidaraMa  Eaaua]  #J>rk  end  io&s  Tn4  process  of 

integration  may  bis  Stopped  it  ecn^aivfsat  wnd  tlx* 

dure  4-5  cie:--  <jr  'be::  ?abr»vs!  fe«  <'*  TX'f' 

intervals?  C-ofttfSSeifi.  f'^r  wxrole  •.  «c?  5.afcet*iaa4!  febi  fs*« 
indicated  ite  the  figure  *'  rath  as  ex»  si 


X 


\ 


V 


Her'?,  even  the  usual  pros  a  dure  of  Integration  utilises  the 
entire  range  of  the  integrator  disc  and  no  gain  can  be  achieved 
by  Means  of  the  device  as  described  above ►    Ee^ever£,  the  integrand 
any  conveniently  be  treated  by  a  double  application  of  this 
device  splitting  the  interval  of  integration  into  two  parts » 
In  othsi  words,  insteed  of  e  given  function    fix)    we  integrate 
the  difi eranee  betveen    fix)  and  a  step-function.    The  output 
of  she  integrator  is  ~,o  longer    P'x)    *     j  bufc  th* 

difference  be  ere  en    »' x)  end  e  triangular  (or  "roof*-;  funesisn. 


fU) 

r~ — V- 

V 

i — s„: 
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Similarly,  with  a  convenient  subdivision  we  may  use  any  step- 
function  for  the  integrand  and  the  corresponding  polygonal 
line  for  the  integral. 

This  procedure  obviously  requires  resetting  the 
integrator  in  question  and  changing  one  gear  ratio  each  time 
the  machine  is  stopped.    On  the  other  hand,  the  increase  of  the 
scale  factor  is  roughly  proportional  to  the  number  of  subintervals, 

4)    In  principle  this  procedure  may  be  looked  upon 
as  a  special  case  of  the  following  more  general  method.  Instead 
of 


(4)  v(x)    =    Jj  dx 


write 


(5)  w(x)  +  0U)    =      \(y  +  $*)  dx, 

where    0(x)  is  an  arbitrary  function  and    0Hx)  its  derivative. 
In  practice,  of  course,  0(x)  should  be  chosen  so  as  to  render 
the  maximum  of  Jy  +  0'\  as  small  as  possible  in  order  to  in- 
crease the  scale  factor  on  the  integrator.    Now  if    0(x)  is 
not  a  linear  function,  the  mechanization  of  (5)  would  require 
two  new  input  tables  or  their  equivalent.    However,  the  possi- 
bility of  obtaining  some  special  0(x)  by  means  of  non-circular 
gears  should  not  be  overlooked.    This  would  mean  a  considerable 
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improvement  of  the  linear  method. 

5)  We  have  been  asked  by  Dp.  Dederick  to  consider 
whether  it  would  be  advantageous  to  generate  from  an 
input  table  (instead  of  by  integration,  as  at  present).  The 

foregoing  remarks  contain  an  answer  to  this  question.    It  is 
not  difficult  to  s  ee  that  the  present  method  of  obtaining  the 
function  by  integration  is  more  efficients    It  would  probably 
become  even  more  so  if  the  recommendation  2)  were  put  into 
effect. 

6)  Although  it  is  in  no  direct  connection  with  the 
subject  of  this  report,  we  enclose  an  Appendix  describing  a 
simplified  method  for  computing  gear  ratios.    This  method  is 
based  on  previous  experience  (of  one  of  us)  at  M.I.T.  and  may 
prove  useful  in  connection  v/ith  ballistic  work  on  the  Aberdeen 
Analyser . 


Brown  University,  Providence,  R.I. 

and 

Bell  Telephone  Laboratories,  N.Y. 
May  27,  1943. 


W.  Feller 
C.E.  Shannon 
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A  METHOD  OF  DETERMINING  GEAR  RATIOS 

• 

In  this  appendix  a  simplified  method  of  determining 
gear  ratios  for  an  analyzer  set  up  will  be  described  which 
was  used  for  some  time  on  the  K.I.T.  analyzer  and  proved  in 
general  to  be  considerably  faster  and  easier  to  change  than 
the  original  method  of  equalities  and  inequalities.  The 
method  may  be  briefly  outlined  as  follows: 

1.  Draw  the  set  up  with  an  unknown  gear  ratio  in 
each  shaft  of  limited  displacement.    An  unspecified 
ratio  is  also  placed  in  the  two  inputs  of  each  adder. 

2.  Calculate  an  approximate  scale  factor  on  the 
independent  variable  to  give  the  expected  time  of 
solution  at  the  average  rate  at  which  it  turns. 
Choose  an  exact  scale  factor  near  this  approximate 
one  which  is  a  "round  figure"  in  terms  of  obtain- 
able gear  ratios  -  i,e.,  factorable  into  a  small 
number  of  simple  rationale. 

3.  Choose  in  the  same  way  scale  factors  for  all 
shafts  of  limited  displacement  -  integrator  inputs 
and  function  table  inputs,  and  outputs  -  so  as  not 
to  exceed  their  limits  with  expected  displacements. 

4.  This  fixes p  by  division,  and  from  the  integrating 
factor  of  the  integrators,  the  scale  factors  and 
gear  ratios  of  all  shafts  except  those  containing 
adders.    In  the  case  of  adders  the  input  shaft  with 
smallest  scale  factor  fixes  the  scale  factor  of  the 
adder,  the  other  input  being  geared  down  to  the  same 
scale  factor.    The  output  gear  in  the  adder  is  then 
fixed* 

5.  The  set  up  is  then  inspected  to  see  that  no 
integrators  or  other  parts  are  too  heavily  loadedo 
If  they  are,  reduction  gears  are  transferred  from 
inputs  to  outputs  to  reduce  loads  when  possible, 
otherwise  the  soale  factor  on  the  independent 
variable  is  increased. 

In  case  the  ratios  come  out  too  complicated  dif- 
ferent scale  factors  are  chosen  in  Step  3.    With  a  little 
practice  and  foresight,  however,  it  is  possible  to  obtain 
suitable  ratios  on  the  first  trial. 
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Two  Hew  Circuits  for  Alternate  Pulse  Counting 

The  well  known  W-Z  relay  circuit  is  shown  in 
Fig.  1.    A  is  a  pulsing  contact  which  is  alternately  opened 
and  closed.    Indicating  closure  of  contacts  by  0  and  open- 
ness toy  1  and  for  relays  0  for  operated  (up)  and  1  for 
unoperated  (down)  the  circuit  goes  through  the  following 
periodic  cycle  of  operation: 


A 

w 

z 

1 

1 

1 

0 

0 

1 

1 

0 

0 

0 

•  1 

0 

1 

1 

1 

Thus  one  complete  cycle  requires  two  complete  pulses  on  A. 

This  note  describes  two  apparently  new  circuits 
which  perform  the  same  function.    These  are  shown  in  Fig.  2 
and  Fig.  3.    The  operating  cycles  for  these  are: 
Fig.  2  Fig.  3 


A 

w 

z 

A 

f 

z 

1 

0 

1 

1 

1 

1 

0 

0 

0 

0 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

These  three  circuits  may  be  compared  with  regard 
to  the  number  of  elements  required  as  follows: 

Belays  Contacts  Resistances 

Figure  12  1  continuity,  1  transfer  2 

Figure  2  2  2  continuity,  1  break  1 

Figure  3  2  2  transfer,  1  make  1 

In  Fig.  3  the  resistance  is  theoretically  superfluous; 


if  the  transfer  elements  could  be  trusted  never  to  be  shorted 
it  could  be  omitted,  but  in  practice  would  be  necessary  to 
avoid  shorts  when  the  relays  were  being  adjusted.    Figs.  2  and 
3  are  essentially  duals,  and  3  was  obtained  from  2  by  the 
duality  theorem. 

In  Fig.  2  it  may  be  noted  that  the  two  relays  are 
*ip-when  A  is  closed,  while  in  the  standard  circuit  they  are  both 
^jTwhen  A  is  open.    This  might  be  desirable  in  some  applications. 
Fig.  3  has  the  possible  disadvantage  that  both  ends  of  the 
pulsing  contact  A  are  connected  into  the  circuit,  while  in  1 
and  2  one  end  can  be  grounded. 
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Counting  Vp  or  ixmn  vith  -ulse  counters  w  J 1 


iith  binary  counter*  of  either  relay  or  *l»c5rsnic 
type  i*  is  ;o£sit2«  by  simple  KKsdif  icutisn  u>  count  bo  ih  up  end 
doon.    £uppose  Us*  largest  uuaber  that  oaa  be  j  w^isterec  is  L* 
refining  the  ao^lisent  of  «aiy  »unh»r  *     &  fey  t-a  *  «'  *e  sots 
that  subtracting  *  nutther  »  rrsJi  S  is  s^ulvileai  ta  adSin*     w  its 
eoapllsjsnt  ftt«i  •  Mf*He  •  thus  If  in  6  binary  oouatsr 

**  t&tis  the  soapllosat  o/  «  reading  ^hioa  s»&as  locking  up  Uis 
;*ul*y    urieft  ttrt  dSKja  and  #4ee-vei      lu  the  oa^,  aid 

putting  out  the  tubas  vfcioU  fire  ot&guetiag  unfi  vie  iu  Ute 

electronic  auoe)  and  then  let  the  counts*  eo&tlnue  add  tits  dumber 
of  pulses  in  rjuertion,  and  finally  t^ice  the  aa^lifitaat,  &^uin,  we 
a&ve  au&trseted  the  nuabsr.    ^etually  hm**v»r,  this  -raoees  onn 
be  done  si&ply  by  trcuef orric^  the  carryover  le&as  t»  the  opposite 
digit  ( tube  or  rtl«y).    ic  the  reity  esse  this  sjoouats  t*»  a  transfer 
Qcm toot  *e«*c*n  each  adjnsent  pair  of  digit*,  a&e  an  additional 
safes  oostoot*    in  the  eleutrouio  oaft*  the  carryover  lease  go  froa 
the  "  tAtar  tube  plut*  to  triiis  on  the  next  sts^a.   Here  *e  eoul4 
insert  «n  alcetroale  transfer  oontaat,  *»  s^wt,  for  exsnplo  in 
Figure  1.    jthen  *c  wish  to  add,  the  ©©asson  eon troi  leads  far  "edd 
is  given  sutoff  voltage,  the  -subtract"  lead  a  large  negative  vol- 
tage.   A  positive  lapulee  on  the  "one0  plate  of  a  state  then  cause* 
one  side  of  the  double  triade  to  c endue t  giving  %  negative  impulse 
to  the  next  g7id»  far  a  enTryvwr •    f  er  subtrfcctioo  the  voltages 
on  the  soatrol  leads  ars  revexfcod  atid  carryover  ooours  when  the 
"aero"  plate  volte,  •  inore&ses  i.e.,  when  this  tube  goes  out* 

0«  £.  &£*KjfCX 


C-»f  A  (9-4*) 


Cover  Sheet  for  Technical  Memoranda 
Research  Department 


subject:    clrcuitg  for  a  PiC>M>  Transmitter  and  Receiver  - 
Case  20878 


ROUTING: 

"    S.A.S.,H.W.B.,  H.F. 

2    --  CASE  FILES 

*  G.W.Gilman 
5  -H.W.Bode 
s    A. G. Jensen 
->  W.M.Goodall 

8  E.Peterson 

9  H.SoBlack 

10  -W.F.Simpson  -  Patent  Dept. 

11-  J.  H.Pierce 

12-  R.L.Dietzold 

13-  £.B  Zeldman  t$55$£^L 

14-  W.T.Wintringham 

15-  F.B.Llewellyn 

16-  C.H.Elmendorf 

17-  B.  M.Oliver 

1 8-  C.E.  Shannon 


MM 


44-110-37 
DATE   June  1,  1944 
author  s  c.E.Shannon  and 
B.M.Oliver 


ABSTRACT 


Circuits  are  described  for  a  P. CM.  transmitter 
and  receiver.    The  transmitter  operates  on  the  principle 
of  counting  in  the  binary  system  the  number  of  quanta 
of  charge  required  to  nullify  the  sampled,  voltage. 
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MEMORANDUM  FOR  FILE 


The  circuits  shown  in  the  present  memorandum  are 
intended  to  fill  the  boxes  of  the  block  functional  designs 
for  a  PCM  transmitter  and  receiver  shown  in  Fig.  6  of  a  December 
1943  lueworandum  (MM-43 -110-43) .         The  transmitter  functional 
diagram  is  shown  here  as  Fig.  1  and  the  general  operation 
is  as  follows.    The  incoming  signal  is  sampled  periodically 
by  closing  the  electronic  switch  1  with  periodic  impulses 
from  the  timer.    This  charges  condenser  C  to  the  sampled 
voltage  and  the  electronic  switch  opens  after  each  impulse 
isolating  the  condenser  from  the  signal.    The  existence  of 
a  voltage  across  the  condenser  causes  the  comparator  to  olose 
electronic  switch  2  which  allows  pulses  of  charge  to  feed 
into  the  condenser  from  the  pulse  generator,  discharging  the 
condenser.    The  number  of  these  pulses  is  counted  in  the 
binary  system  by  the  binary  counter  and  when  the  condenser 
is  reduced  to  a  reference  voltage,  the  comparator  opens  elec- 
tronic switch  2.    Near  the  end  of  the  sampling  period  the 
binary  counter  is  connected  to  the  distributer  which  registers 
the  binary  number  counted,  and  the  counter  is  then  reset  to 
zero;  both  of  these  operations  controlled  by  impulses  from  the 
timer.    The  distributer  then  sends  a  series  of  pulses  or  not 
down  the  output  line  according  as  the  binary  digits  are 
1  or  0.    These  digits  are  sent  in  reverse  order,  the  least 
important  being  sent  first,  to  tie  in  with  the  contemplated 
receiver  circuit. 

The  specific  circuits  are  shown  in  Figs.  2  to  8,  and 
detailed  descriptions  of  their  operation  follow. 

Fig.  2  shows  the  electronic  switch  1  which  charges  the 
condenser  C  to  the  signal  voltage  at  the  sampling  times.  The 
signal  wave  is  biased  up  so  that  its  minimum  value  is  slightly 
positive,  and  impressed  on  terminal  1  as  a  voltage;  i.e,  the 
signal  source  as  seen  from  terminal  1  is  assumed  to  be  of  low 
impedance.    The  timer,  at  the  sampling  time  puts  a  positive 
pulse  on  terminal  2,  which  is  inverted  by  the  triode  to  give 
a  negative  pulse  on  the  pentode  control  grid.    This  causes  the 
pentode  which  was  previously  conducting  to  cut  off.  Before 
the  pulse  condenser  C  had  a  small  minimum  positive  charge 
and  neither  diode  was  conducting  since  the  plates  were  held 
at  a  low  positive  potential  by  the  pentode  current.    As  the 
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pentode  cuts  off,  the  diode  plates  swing  positive  and  the  right 
hand  diode  starts  to  conduct  charging  the  condenser.    As  this 
condenser  voltage  builds  up  exponentially  the  voltage  on  the 
diode  plates  also  increases  positively  until  it  reaohes  the 
signal  voltage  and  at  that  instant  the  left  hand  diode  starts 
to  oonduct.    The  voltage  stops  rising  at  this  point  since  the 
plates  are  now  essentially  short  circuited  to  the  low  impedance 
signal  source.    This  all  occurs  during  the  timing  pulse,  and 
at  the  end  of  this  pulse  the  pentode  again  starts  oonduoting 
dropping  the  diode  plates  to  a  small  positive  voltage,  less 
than  the  minimum  signal  voltage,  and  isolating  the  condenser* 

Fig.  3  shows  a  standard  multi-vibrator  circuit  for 
giving  a  series  of  square  pulses.    The  coil  condenser  cross 
connection  of  plates  to  grids  causes  the  grid  transient  to 
be  a  cosine  curve  which  crosses  the  cut  off  grid  voltage  at 
a  time  determined  essentially  by  the  LC  product  and  independent 
of  amplitude  changes  due  to  variations  in  plate  supply,  etc. 
As  this  point  determines  the  period  of  oscillation,  the 
oscillator  has  good  frequency  stability.    The  output  appears 
on  terminal  6  as  a  square  wave. 

Fig.  4  is  the  comparator,  which  is  actually  only  a 
differential  amplifier  with  sufficient  gain  so  that  the 
granularity  voltage  applied  to  the  input  is  capable  of 
driving  the  amplifier  from  saturation  in  one  direction  to 
saturation  in  the  other.    The  input  is  the  voltage  on  condenser 
C  which  immediately  after  a  sampling  instant,  will  be  at  the 
sampled  signal  voltage.    This  voltage  starts  decreasing  by 
steps  as  the  condenser  is  discharged  and  when  the  condenser 
voltage  applied  to  terminal  3  moves  down  the  step  which  crosses 
the  differential  amplifier  threshold,  the  amplifier  swings  from 
saturation  with  output  terminal  5  at  nearly  zero  voltage  to 
a  high  negative  voltage. 

The  electronic  switch  2  is  shown  in  Fig.  5.  This 
circuit  sends  units  of  charge  into  the  condenser  through 
terminal  3  under  the  control  of  the  comparator  output  coming 
in  on  terminal  5.    The  multi-vibrator  output  is  connected  to 
terminal  6  and  the  output  of  the  multi-grid  tube  will  be  a 
square  wave  when  5  is  positive,  which  ceases  when  the 
comparator  swings  to  the  other  saturation  point  driving  the 
voltage  on  5  in  the  negative  direction.    The  double  diode 
connection  gives  a  pump  action.    When  the  plate  voltage  of 
the  multi-grid  tube  increases  to  the  upper  part  of  the  square 
wave,  the  charge  flows  into  the  condenser  from  terminal  4 
through  the  left  diode.    During  the  lower  part  of  this  wave 
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the  oondenser  discharges  through  the  right  diode  out  into  the 
condenser  C,  via  terminal  3.    As  this  causes  the  potential  of 
3  to  decrease  gradually  down  a  step  function,  it  is  necessary 
for  the  input  voltage  at  4  to  decrease  similarly;  otherwise 
the  difference  in  voltage  between  3  and  4  would  cause  the  size 
of  quanta  to  decrease  gradually.    This  lowering  of  the  voltage 
on  4  is  accomplished  by  a  cathode  follower  arrangement  on  the 
first  cathodes  in  the  comparator,  which  follow  the  step  voltage 
down. 

The  binary  counter  is  shown  in  Fig.  6.    The  descending 
step  voltage  which  appears  on  condenser  C  is  applied  to  the 
input  of  this  circuit  through  terminal  3.    The  input  resistance 
condenser  combination  serves  as  a  differentiating  circuit  (the 
time  constant  fairly  small  compared  to  the  time  between  steps) 
so  that  the  voltage  applied  to  the  first  grid  of  the  double 
triode  consists  of  a  series  of  negative  spikes.    The  double 
triode  is  simply  a  two  stage  resistance  coupled  amplifier,  and 
its  output  feeds  the  binary  counter  digit  tubes.    This  circuit 
is  of  standard  type  with  two  pentodes  in  each  stage  and  there 
are  two  stable  points  for  each  stage,  one  with  the  upper  tube 
cut  off  and  the  lower  tube  conducting,  and  the  other,  the  con- 
verse situation.    A  negative  impulse  from  a  preceding  stage 
applied  through  the  coupling  condensers  changes  the  state  from 
the  previous  stable  condition  to  the  opposite  one.    This  impulse 
is  applied  symmetrically  to  both  suppressors,  but  the  condenser 
across  the  cathode  resistances,  charged  in  one  direction  from 
the  previous  state,  biases  the  choice  of  the  next  state  toward 
the  opposite  one.    The  control  grids  of  the  "zero"  tubes  (the 
upper  row  which  are  conducting  when  the  corresponding  binary 
digits  are  zero)  are  connected  to  a  common  control  lead  which 
is  used  to  reset  the  reading  to  zero  after  the  reading  is  reg- 
istered by  the  distributor.    This  is  accomplished  by  a  neg- 
ative impulse  from  the  timer.    The  outputs  to  the  distributer 
are  taken  off  the  plates  of  the  "unit"  tubes. 

The  distributer  is  shown  in  Pig.  7.    After  the 
number  of  quanta  of  charge  has  been  counted  in  the  binary 
counter,  the  leads  11,  12,  13,  14,  15  will  have  either  low 
positive  voltages  or  B+,  according  as  the  corresponding  digit 
is  one  or  zero.    The  grids  of  the  left  triode,  will  then  be 
either  negative  or  positive  from  the  potentiometer  action 
to  the  negative  voltage  C-.    To  register  the  counter  reading, 
a  positive  pulse  from  the  timer  is  applied  to  the  control 
grid  of  the  common  pentode  allowing  it  to  conduct  and  pulling 
the  cathode  of  the  left  triode  and  the  diode  in  all  stages 
negatively.    If  a  digit  is  zero,  the  potential  of  the  cathodes 
in  that  stage  stops  at  a  positive  value  due  to  current  through 
the  triode  and  the  diode  does  not  conduct.    If  the  digit  is 
one  the  cathodes  are  pulled  negative  and  the  corresponding 


oondenser  C0  ia  discharged  through  the  diode  and  pentode. 
At  the  end  of  the  registering  pulse,  the  cathodes  go  positive 
again,  isolating  each  C0,  with  the  digit  registered  as 
presence  or  absence  of  charge.    The  reading  is  taken  off  the 
(/—        series  of  condensers  CQ  in  sequence  by  positive  pulses  from 
the  timer  on  leads  21,  22,  23,  24,  25.    These  pulses  allow 
the  right  hand  triodes  to  conduct  and  each  Cq  in  turn  to 

oharge  through  the  output  lead,  leaving  them  in  the  normal 
state  (at  a  voltage  about  equal  to  the  pulse  voltage).  If 
the  digit  is  "zero"  no  oharge  of  CQ  from  the  output  lead 

occurs.    Thus  negative  pulses  appear  on  the  output  when  and 
only  when  the  registered  digits  are  one. 

The  timer  system  is  shown  in  Fig.  8.    An  oscillator 
which  may  be  synchronized  subharmonically  with  the  pulse 
generating  multi-vibrator,  operates  at  the  sampling  frequency. 
This  passes  through  the  clipper  amplifier  to  give  a  square 
wave,  which  is  differentiated  to  give  alternating  positive 
and  negative  spikes.    A  second  clipper  amplifier  eliminates 
the  negative  spikes  and  makes  the  positive  ones  rectangular. 
These  short  rectangular  pulses  are  fed  into  a  delay  line 
terminated  in  its  characteristic  impedance.    The  timing  pulses 
needed  for  the  various  circuit  functions  are  tapped  off  at 
the  appropriate  places  as  indicated.    A  synchronizing  pulse 
may  also  be  taken  off  the  same  delay  line. 

Fig.  9  shows  the  receiver  circuit.    The  signal 
passes  through  the  clipping  amplifier  which  is  adjusted  to  give 
a  saturation  voltage  on  the  output  if  a  pulse  is  present  and 
none  if  absent.    This  output  is  applied  to  the  grid  of  a 
multigrid  pentode,  whose  other  control  grid  is  given  positive 
gating  pulses  at  the  center  of  the  digit  intervals.  These 
gating  pulses  allow  the  pentode  to  conduct  if  a  pulse  is  present 
and  the  plate  current  is  then  independent  of  the  plate  voltage 
(providing  this  stays  within  certain  limits)  so  that  if  a 
pulse  is  present,  a  fixed  amount  of  charge  (equal  to  the 
length  of  the  gate  times  the  pentode  current)  flows  onto  the 
condenser.    The  time  constant  of  the  R  C  system  (including  the 
pentode  load  resistance)  is  adjusted  to  allow  the  voltage  to 
restore  itself  halfway  toward  the  equilibrium  value  in  the 
time  from  one  digit  to  the  next,  so  that  after  all  pulses 
have  been  oollected  on  the  condenser,  the  charge  contributions 
of  the  first,  second,  third  etc.  have  decayed  by  factors  of 

2^'         i2"'       1#    At  this  tlme  a  positive  gating  pulse  is  put 

(r       on  the  grid  of  the  second  pentode,  allowing  the  condenser  to 
discharge  rapidly  into  the  low  pass  filter.    The  timer  system 
can  be  realized  with  the  systems  shown  in  either  Fig.  10  or 
Fig.  11. 
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ABSTRACT 


A  mathematical  theory  of  secrecy  systems  is 
developed.    Three  main  problems  are  considered.     (1)  A 
logical  formulation  of  the  problem  and  a  study  of  the 
mathematical  structure  of  secrecy  systems.    (2)  The 
problem  of  "theoretical  secrecy,"  i.e.,  can  a  system  be 
solvod  givon  unlimited  time  and  how  much  material  must 
be  intercepted  to  obtain  a  uniquo  solution  to  cryptograms. 
A  sccrocy  measure  called  tho  "equivocation"  is  defined 
and  its  properties  developed,    (3)  The  problem  of 
"practical  socrocy."    How  can  systems  bo  made  difficult 
to  solve,  ovon  though  a  solution  is  theoretically 
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Introduction  and  Summary    •  BOD  WR 5200.10 

In  the  present  paper  a  mathematical  theory  of     .  .  • 
cryptography  and  secrecy  systems  Is  developed*.  The  entire 
approach  is  on  a  theoretical  level  and  is  intended  to  spmple*  : 
ment  the  treatment  found  In  standard  works  on  cryptography,  * . • , -  V • 
There,  a  detailed  study  Is  made  of  the  many  standard  types  of-^:-  • 
codes  and  ciphers,  and  of  the  ways  of  breaking  tjiea*.   We  will 
be  more  concerned  with  the  general  mathematical  structure,  and 
properties  of  secrecy  systems,  •: .  .-' 

The  presentation  is  mathematical  in  character.  Wo 
first  dofino  the  pertinent  terms  abstractly  and  then  develop 
our  results  as  lcnrias  and  theorems.    Proofs  which  do  not  con- 
tribute to  an  understanding  of  the  theorems  have  been  placed 
in  the  appendix. 

The  mathematics  required  is  drawn  chiefly  from 
probability  theory  and  from  abstract  algebra.    The  reader  is 
assumed  to  have  some  familiarity  with  these  two  fields.  A 
knowledge  of  the  elements  of  cryptography  will  also  be  help- 
ful although  not  required. 

The  treatment  is  limited  in  certain  ways.  First, 
thero  are  two  general  typos  of  secrecy  system;  (x)  conceal-  * 
ment  systems,  including  such  methods  as  invisible  ink,  con- 
cealing a  message  in  an  .innocent  text,  or  in  a  fake  covering   

cryptogram,  or  other  methods  in  which  the  existence; of  the  .  - 
message  is  concealed  from  the  enemy;  (2),  "true"  seorocy  systems  . 
where  the  moaning  of  the  message  is  concealed  by  ciphofr,  code, 
etc.,  although "its  existence  is  not  hidden.    We  oonsider_  only  V 
the  second  type--oonoealment  systems  are  more  of  a  psychological 
than  a  mathematical  problem.    Secondly,  tho  treatment  Is  limited  v 
to  the  case  of  discrete  information,,  whore  tho  information  to 
bo  enciphered  consists  of  a  sequence  of  discrete  symbols,  each  - 
chosen  from  a  finite  set.    These  symbols  may  be  letters  in  a 

*Soo,  for  example,  H.F.Gaines,  "Elementary  Cry^tana^1J(s^oRMAT.oN  w«g 
or  M.  Glvierge,  "Cours  do  Cryptographic. ft;5  TME  katonm-  oi^  w  ^Vvonage 

*    "       person  is  p*«oH»an«>  a* 
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language,  words  of  a  language,  amplitude  levels  of  a  "quantized" 
speech  or  video  signal,  etc.,  but  the  main  emphasis  and  think- 
ing has  beon  concerned  with  the  case  of  letters.    A  preliminary- 
survey  indicates  that  the  methods  and  analysis  can  be  general- 
ized to  study  continuous  cases,  and  to  take  into  account  the 
special  characteristics  of  speech  secrecy  systems. 

The  paper  is  divided  into  three  parts.    The  main  re- 
sults of  these  sections  will  now  be  briefly  summarized.  Tho 
first  part  deals  with  tho  basic  mathematical  structure  of 
language  and  of  secrooy  systems,    A  language  is  considered  for 
cryptographic  purposes  to  bo  a  stochastic  process  which  pro- 
duces a  discrote  sexjuonco  of  symbols  in  accordance  with  some 
systems  of  probabilities.    Associated  with  a  language  there 
is  a  certain  parameter  D  which  wo  call  tho  redundancy  of  the 
language,    D  measures,  in  a  sense,  how  much  a  text  in  tho 
language  can  be  reduced  In  longth  without  losing  any  informa- 
tion. .  As  a  simple  example,  if  each  word  in  a ■t'efcfc' ip  repeated 
a  reduction  of  50 'per  cent  is  immediately  poesi*lcV  .further  4  :  : 
reductions  may  be  possible  due  to  tho  statistical  structure  of  * 
tho  language,  the  high  frequencies  of  cortaih  lottersorv  words,  r 
etc.   The  redundancy  is  of  considerable  importcjido ' ;in;  the  ' study ' 
of  secrecy  systems.  ,  '    /;  ' 

A  secrecy  system  is  defined  abstractly  as  a  sot  of 
transformations  of  one  space  (the  sot  of  possible  messages) 
into  a  socond  space  (the  sot  of  possible  cryptograms).  Each 
transformation  of  the  set  corresponds  to  enciphering  with  a 
particular  key  and  the  transf omations  are  supposed  reversible 
(non-singular)  so  that  unique  deciphering  is  possible  when  the 
key  is  known. 

Each  key  and  therefore  each  transformation  is  assumed 
to  have  an  a  priori  probability  associated  with  it— the  proba- 
bility of  cEoosing  that  key,    Tho  set  of  messages  or  message 
space  is  also  assumed  to  have  a  priori  probabilities  for  tho 
various  messages, .  i.e.,  to  be  a  probability  c^  measiire  space. 

f  ■ 

In  the  usual  cases  the  "messages"  oonsist  of  sequences 
of  "letters.".  In  this  oase  as  noted  above  the  ©essage  space  is 
represented  by  a  stochastio  process  which  generates  sequences  of 
letters  according  to  some  probability  structural ■.  ~:  -  :<p 
.'  •  ,   •  v     '  '       '*•:..-  •'.  -  '••  .  "  • . ,  !  .'     -v  • ,; 

">."  These  probabilities  for  various  keys  and  messages^  are^ 
actually  the  enemy,  crypt  analyst's  a  priori  probabilities  for  / 
the  choices  in  question,  and  represent  his.  aj>rl6rf  knowledge" 
of  the  situation*    Touse  tho  system  a  key  is  first  selected 
and  sent  to  tho  receiving  point.    The  choice  of  6,&©y  determines 
a  particular  transformation  in  tho  set  forming  the^sys torn.  Then 
a  message  Is  selected  and  tho  particular  transformation  applied 
to  this  message  to  produce  a  oryptogram.    This  cryptogram  is 
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transmitted  to  the  receiving  point  by  a  channel  that  may  be 
intercepted  by  the  enemy.    At  the  receiving  end  the  inverse 
of  the  particular  transformation  is  applied  to  tho  cryptogram 
to  recovor  tho  original  message. 

If  the  enemy  intercepts  tho  cryptogram  he  can  calcu- 
late from  it  the  a  posteriori  probabilities  of  the  various 
possible  messages  and  keys  which  might  have  produced  this 
*  cryptogram.    This  set  of  a  posteriori  probabilities  constitute 
his  knowledge  of  the  key  and  moss  ago  after  the  interception.* 
The  calculation  of  these  a  posteriori  probabilities  is  the 
generalized  problem  of  cryptanalysis • '  ~  .""  "         ;  \ 

i  * 
As  an  example  of  these  notions,  in  a,  simple  substi- 
tution cipher  with  random  key  there  arc  261  transformations, 
corresponding  to  the  261  ways  we  can  substitute  for  26  dif- 
ferent letters.'  These  are  all  equally,  likely  and  each  there- 
fore has  an  a  priori  probability  l/B&Wz  it  this  is  applied 
to  "normal  English"  the  cryptanalyst  being  assumed  to  have  no 
knowledge  of  tho  message  source  o^hoc  than,, that- it  is  English, 
tho  a  priori  probabilities  of  various  m&jBsageak  Gf  N  lectors' 
.ore  merely  their  frequency  in  normal  JSngiish  iext*  ~ 

If  the  enemy  intercepts  N  letters  of  cryptogram  in 
this  system  his  probabilities  chango.    If  N  is  large  enough 
(say  50  letters)  there  is  usually  a  single  message  of  a  poster 
probability  nearly  unity,  while  all  others  have  a  total  proba- 
bility nearly  zero.    Thus  there  is  an  essentially  unique  "solv 
tion"  to  the  cryptogram.    For  K  smaller  (say  N  «  15)  there  wil 
usually  be  many  messages  and  keys  of  comparable  probability, 
with  no  single  one  nearly  unity.    In  this  case  there  are  multi 
"solutions"  to  the  cryptogram.  ,  ,  - 

Considering  a  secrecy  system  to  be  a  set  of  trans- 
formations of  one  space  into  another  with  definite  probability 
associated  with  each  transformation,  there  are  two  natural  coe 
binlng  operations  v/hi oh  produce  a  third  system  from  two  givon 
systems.    The  first  combining  operation.  Is  called  the  product 
operation  and  corresponds  to  enciphering  the  message  with  the 
first  system  R  and  enciphering  tho  resulting  cryptogram  with 
system  S,  the  keys  for  R  and  3  being  .chosen. ; independently. 
This  total  operation  is  >  secrecy  sjrstcte  "whose  transformations 
consist  of  all  the  products  (in  tho Jusual , sons©  of  products  of 
transformations)  of  transformations  ia  $  with  transformations 
in  R.    The  probabilities  arc  'the  prodticts  of  the" probabilities 
for  tho  two  transformations.    .  .  3.        J§E  .:\  T- 


The  sooond  combining  operation  is  "weighted  addition 


»>  J  T-  - 


T  -  pR  4  qS    .    J  .  p  *  q  «-  1- 

*"Khowlodgo"  is  thus  identified  with 'a  set  of  propositions  hav 
associated  probabilities.    We  are  liero' at  variance  with  the 
doctrine  often  .is sumo d  in  philosophical  studies  which  conside 
knowledge  to  be  a  set  of  propositions  which  are  either  true  o 
fslso.  .  f  ■  :.  v. 
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It  corresponds  to  making  a  preliminary  choice  as  to  whether 
system  R  or  S  is  to  be -used  with  probabilities  p  and  q,  respec- 
tively.   When  this  is  done  R  or  S  is  used  as  originally  defined. 

It  is  shown  that  secrecy  systems  with  these  twn  com- 
bining operations  form  essentially  a  "linear  associative  algebra 
with  a  unit  element,  an  algebraic  variety  that  has  been  exten- 
sively studied  by  mathematicians.    Some  of  the  properties  of 
this  algebra  are  developed. 

Among  the  many  possible  secrecy  systems  there  is  one 
type  with  many  special  properties.  This  type  we  oall  a  "pure" 
system.  A  system  is  pure  if  for  any  three  transformations  T, . 
T.t  Tk  in  the  set  the  product  1 

TiVV  . 


is  also  a  transformation  in  the  set,  and  all  keys  are  equally 
likely.    That  is  enciphering,  deciphering,  and  enciphering  with 
any  throe  keys  must  be  equivalent  to  enciphering  with  some  key. 

With  a  pure  cipher  it  is  shown  that  all  keys  are 
essentially  equivalent—they  all  lead  to  the  same  set  of  a 


posteriori  probabilities.    Furthermore,  when  a  given  cryptogram 
is  intercepted  there  is  a  set  of  messages  that  might  have  pro- 
duced this  cryptogram  (a  "residue  class"/  and  the  a  posteriori 
probabilities  of  messages  in  this  class  ore  proportional  to  the 
a  priori  probabilities.    All  the  information  the  enemy  has  ob- 
trinod  by  intercepting  the  cryptogram  is  a  specification  of  the 
residue  class.    Many  of  the  common  ciphers  are  pure  systoms, 
including  simple  substitution  with  random  key.    In  this  case 
the  residue  class  consists  of  all  messages  with  the  same  pattern 
of  letter  repetitions  as  the  intercepted  cryptogram, 

Two  systems  R  and  S  are  defined  to  be  "similar"  if 

there  exists  a  fixed  transformation  A  with  an  inverse,  A"1  such 
that 

'      .  R  «  AS  .  ,  ~ 

■  *  'J 

If  R  and  S  are  similar,  a  one-to-one  correspondence  between  the 
resulting  cryptograms  can  be  set "up  leading  to  the  same  a  poste- 
riori probabilities.    The  two  systoms  are  cryptnnalyticaTly  the 
samo ,  v  . »  . 

The  second  main  part  of  tho  paper  deals  with  tho  prob- 
lem of  "thooretical  security."    How  secure  is  a  system  again: 
cryptanalysis  when  the  enemy  has  unlimited  time  and  manpower 
available  for  tho  analysis  or  intercepted  cryptograms? 


"Perfect  Secrecy*  is  defined  by  requiring  of  a  system 
that  after  a  cryptogram  is  intercepted  by  the  enemy  the  a  pos- 
teriori probabilities  of  this  cryptogram  representing  various 
messages  be  identically  the  same  as  the  a  priori  probabilities 
of  the  same  messages  before  the  interception.    It  is  shown  that 
perfect  secrecy  is  possible  but  requires,  if  the  number  of 
messages  is  finite,  the  same  number  of  possible  keys--if  the 
messago  is  thought  of  as  being  constantly  generated  at  a  given 
"rate"  R,  (to  be  defined  later),  key  must  be  generated' at  the 
same  or  a  greater  rate* 

If  a  secrecy  system  "with  a  finite  key  is  used,  and  N 
letters  of  cryptogram  intercepted,  there  will  be,  for  the  enemy, 
a  certain  set  of  messages  with  certain- probabilities,  that  this 
cryptogram  could  represent.    As  N  Increases  the  field  usually  . 
narrows  down  until  eventually  there  is  a  unique  "solution'*:  to 
the  cryptogram — one  message  with  probability  essentially  unity : 
while  all  othors  are  practically  zero.    A  quantity  OJN)  is  de- >'  .:  \ 
fined,  called  the  equivocation,  which  measure^  lii  n  statistical  v 
way  how  near  the' average  cryptogram  of  H  letters  is  to  a  unique 
solution;  that  is,  how  uncertain  the  enemy, is  of  the  original;  -  - 
message  after  intercepting  a  cryptogram  of  N  letters.  Various 
properties  of  the  equivocation. are  deduced — for  example,  the 
equivocation  of  the  key  never  incroasos  with  increasing  N. 
This  quantity  Q  ia  s  theoretical  secrecy  index — theoretical  In 
that  it  allows  the  enemy  unlimited  time  to  analyse  the  cryptogram 

The  function  Q(N)  for  a  certain  idealized  type  of 
cipher  called  the  random  cipher  is  determined.    With  certain 
corrections  this  function  can  be  applied  to  many  cases  of  practi- 
cal interest.    This  gives  a  way  of  calculating  approximately 
how  much  intercepted  material  is  required  to  obtain  a  solution 
to  a  secrecy  system.    It  appears  from  this  analysis  that  with 
ordinary  languages  and  the  usual  types  of  ciphers  (not  codes) 
this  "unicity  distance"  is  approximately  |K|/D.    Here  |K|  is  a 
number  measuring  the  "size"  of  the  key  space.  : If.  all  keys  are 
a  priori  oqually  likely  |K|  is  the  logarithm  of  the  number  of 
possible  keys.    D  is  the  redundancy  of  the  language  and  measures 
the  excess  information  content  of  tho  language.    In  simple  sub- 
stitution with  random  key  on  English  |K|  isltW)  261  or  about  ,  /  . 
£0  and  D  is  about  .7  for  English.  ■  Thus  unicity  occurs  at  about  .. 
30  letters.  _  *'  '    .        _  >.  ;J;V^a'V''VY.  ' 

It  is  possible  to"  oonstruct  secrecy . systems  with  a 
finite  key  for  certain  ""languages"  in  which  the  function  ft(N) 
does  not  approach  zero  as  N      «©»  -  In  this  case,  no  natter  how  . 
much  material  is  intercepted,  the  enemy  still  does  not  got  a.,  — 
unique  solution  to  the  cipher  but  is  left  with  many  alterna- 
tives, all  of  reasonable  probability.    Such  systems  we  call 
ideal  systems.    It  is  possible  in  any  language  to  approximate 
such  behavior — i.e..,  to  make  the  approach  to  zero  of  Q(N)  recede 
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out  to  arbitrarily  large  N.    However,  such  systems  have  a 
number  of  drawbacks,  such  as  complexity  and  sensitivity  to 
errors  in  transmission  of  the  cryptogram. 

The  third  part  of  the  paper  is  concerned  with  "prac- 
tical secrecy."    Two  systems  with  the  same  key  size  may  both 
be  uniquely  solvable  when  N  letters  have  been  intercepted,  but 
differ  greatly  in  the  amount  of  labor  required  to  effect  this 
solution.    An  analysis  of  the  basic  weaknesses  of  secrecy  sys- 
tems is  made.    This  leads  to  methods  for  constructing  systems 
which  will  require  a  large  amount  of  work  to  solve*    A  certain 
incompat ability  among  the  various  desirable  qualities  of 
secrecy  systems  is  discussed, 

\  - 


PART  I 

FOUNDATIONS  AND  ALGEBRAIC  STRUCTURE  OF  SECRECY  SYSTEMS 


1.    Choice,  Infornatlon  and  Uncertainty 

Suppose  we  have  a  set  of  possible  events  whose  proba- 
bilities of  occurrence  are  p,,  pg,   ...  ,  p_.    Those  probabilities 
are  known,  but  that  is  all  we  know  concerning  which  event  will 
occur.    Can  we  define  a  quantity  which  will  measure  in  some 
sense  how  ^uncertain"  we  are  of  tho  outcome?    How  much  "choice" 
is  involved  in  the  selection  of  the  event  by  the  chance  element  . 
that  operates  with  those  probabilities?    We  propose  as  a  numer- 
ical measure  of  this  rather  vague  notion  the  quantity 

.     ,n    "  :  .      '  :'  . 

H  «  -    Z    pA  log  pA*  » 

There  are  many  reasons  for  this  particular  formula.  Quantities 
of  this  kind  appear  continually  in  the  present  paper  and  in  the 
study  of  the-  transmission  of  information. 

To  justify  this  definition  wo  will  state  a  number  of 
properties  that  follow  from  it.    Those  properties  will  not  be 
provod  here,*  but  are  easily  deduced  from  the  definition. 
Properties  of  H  *  -  2  p^  log  p^. 

1.  H  =  0  if  and  only  if  all  the  p.^  but  one  are  zero,  this 

one  having  the  value  unity.    Thus  only  when  we  are  certain 
of  the  outcome  does  H  vanish. 

2.  For  a  given  n,  H  is  a  maximum  and  equal  to  log  n  if  and 
only  if  all  the  p,  are  equal  (i.6.  l/n) .    This  is  also 
intuitively  the  most  uncertain  situation. 

3.  Suppose  there  are  two  events  in  question,  with  m  possi- 
bilities for  tho  first  and  n  for  tho  second.    Lot  p^^  be 

the  probability  of  tho  joint  occurrence  of  i  for  tho  first 
and  j  for  the  second.    The  uncertainty  of  the  joint  event  ?•. 

is  -  . 

H  "  "  I J  Pi^  l0g  PiJ  •  • 

For  given  probabilities  p^^  ■  Z  p.  .  for  the  first  and 


*  It  is  intended  to  develop  these  results  in  coherent  fashion 
in  a  forthcoming  memorandum  on  the  transmission  of  informa- 
tion. ' 


qj  »  S         for  the  second,  tho  quantity  H  is  maximized  if 

ond  only  if  the  events  are  independent,  i.e.,  p^.  =  Pi^j  * 
This  maximum  value  is  the  sum  of  the  individual  uncertainties 

H —  Hx  *  Hg 

»  -^S  pj  log  Pj^  -  2        log  q j  ♦ 

These  facts  can  bo  generalized  to  any  number  of .different 

events,  >  ^       %  . 

Suppose  there  are  two  chance  events  A  and  B  as  in  3.  not 
necessarily  independent.  We  define  the  mean  conditional 
uncertainty  of  B,  knowing  A  as    -  ••• 


BTA(B)  -  2  p{A)  HA(B> 


where  HA(B)  is  the  uncertainly  of  B  when  A  has  a  definite  A 

value  A.    Thus  ^(B)  is  the  average  uncertainty  of  B  for 

all  different  events  A,  weighted  according  to  their  differ- 
ent probabilities  of  occurrence c    The  uncertainty  of  tho 
joint  event  is  the  sum  of  the  uncertainty  of  the  first  and 
the  mean  conditional  uncertainty  of  the  second.    In  symbols 

H(A,B)  -  H(A)  +  HA(B) 

This  is  true  whether  or  not  thero  are  any  casual  connections 
or  correlations  between  the  two  evonts. 

In  the  same  situation  the  uncertainty  of  B  is  not  greater 
than  the  joint  uncertainty  H{A,B), 

H(B)  <  H(A,B) 

The  equality  holds  if  and,  only  if  every  B  (of  prdbability /~; 
greater  than  zero)  is  consistont  with -only  one  A.    That - 
is,  if  A  is  uniquely  determined  by  B.  • 

From  properties  3  and  4  wo  have  .  ..  r-  .* 

H(A)  +  H(B)  >  H(A,B). 

H(B)  >  H(A,B)  -  H(A) 

=  H(A)  +  HA(B)  -  H(A) 

H(B)  >  H,(B) 
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Thus  tho  uncertainty  of  B  is  not  greater  than  its  avoragc 
value  when  we  know  A.    Additional  information  never  in- 
creases average  uncertainty.    The  equality  holds  if  and 
only  if  A  and  B  are  independent. 

Suppose  we  have  a  set  of  probabilities  plf  pg,  pn# 

Any  change  toward  equalization  of  these  (supposing 'them 
unequal)  increases  H.    Thus  if  p^  <  pg  and^wo  Increase  p^, 

decreasing  pg  an  equal  amount  (to  keep  the  sum  2  p^  con* 

stant  at  unity)  so  that  p^  and  pg  aro  more  nearly  equal, 

then  H  increases .  More  generally  if  v/e  perform  any  rtaver- 
aging  "  operation  on  the  pj,,  of  tho  form  ' 


■pi 


8. 


a  permutation  of  tho  p.  with  H  of  course 
samc^.  3 


where  2  a^j  *  1  and  all  a^  >  0,  then  H  increases  (except 

in  tho  special  case  where  this  transformation, amounts  to 
no  more  than 
remaining  the 

...  • 

H  measures  In  a  certain  sense  how  much  "information  is  ' 
generated"  when  the  choice  is  made.    Suppose  such  a  chance 
event  occurs  and  we  wish  to  describe  which  of  the  n  possi- 
ble events  took  place •    The  average  amount  of  paper  re- 
quired to  write.it  down  in  a  properly  chosen  notation  is 
in  the  cases  of  interest  to  us,  about  proportional  to  H. 
Thus  there  might  be  10^0  «■  1Q50  possible  events,  with 


10 


■  10"" 3^  and 


of  them  having  a  pr 
probability  of  ^  .1CT50.    We  could  set  up  a  notational  sys- 
tem to  describe  which  event  occurs  as  follows*    We  number 

the  events  from  1  up  to  10*^  +  1050  and  when  one  occurs  - 
write  down  the  corresponding  number.    The  average  amount 
of  paper  required  will  be  proportional  to  the  overage 
number  of  aigits  we  need.    This  will  bo  nearly  30  If  the'li.  /iy 
event  Is  in  the  first  group  of  lO30,  and  about  50  If  In  the'  "/*;/ 
second  group.    Thus  the  average  number  of  digits,  is  about 
40.    We  also  have         ,"•     -  V 


K*  -10' 
*  40 


30  |  ip-ftf-iog  ficT50 
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9-.  Although  tho  last  result  is  only  approximately  true  vtf 
the  number  of  choices  is  finite  it  becomes  exactly  tri. 
when  an  unlimited  sequence  of  choices  is  made.  Thus  3 
a  sequence  of  N  independent  choices  is  made  each  choic 
being  from  n  possibilities  with  probabilities 
p^,  Pgi  ••*»  Pn  then  the  total  amount  of  information 

genoratod  is 

H  ■  -  N  Z  Pjl  log  pj 

;    If  N  is  sufficiently  large,  the  expected  number  of  dif 
required  to  register  tho  particular  choice  made  is  arl 
trarily  close  to  H,  providing  the.  correspondence  betwc 
-   sequences  of  digits  and  sots  of  choices  is  correctly  r 
.  If  incorrectly  made  it  will  be  greater  than  H-.  Moreo\ 
./V  if  n  is  sufficiently  largo  tho  probability  of  needing 
more  than  H  digits  is  very  small*    -  \    / .  , 

10*    It  can  be  shown  that  if wo  requlro^oejrtiairi  reasonable 
"properties  of  a  measure  o^choioot^H^ncertainty  then 
formula  -  S.p^  log  pA  necessarily  follows*    These  roqv 

properties  and  the  proof  of  this  statement  are  given  i 
Appendix  It  The  chief  property  is  that  tho  measure  be 
a  sense  additive— if  a  choice  be  decomposed  into  a  sei 
of  choices  the  total  choice  is  the  sun  (properly  weigl 
of  the  individual 'choice*.    .  ^ 

II,    Finally  we  note  that  quantities  of  the  type  2       log  j 

have  appeared  previously  as  measures  of  randomness,  pr 
larly  in  statistical  mechanics.  Indeed  the  H  in  Boltr 
H  theorem  is  defined  in  this  way,        being  the  probabi 

of  a  system  being  in  cell  i  of  its  phase  space.  Most 
the  entropy  formulas  contain  terms  of  this  type. 

■  ■■■■■■■■  -  ♦,"-''-\ 
Tho  base  which  is  used  in  taking  logarithms  in  the  for 
amounts  to  a  choice  of  the  unit  of  measure. v  If  the  base  is 
we  will  call  the  resulting  units  "digits;"  if  the  base  is  t 
the  .units  will  be  oallod  Halternativps.^  i- One  digit  is  nbou 
alternatives.  A' choice  from  1000  equally  likely  possibilit 
is  3  digits  or  about  10  alternatives.    .  , 

2.    Language  as  a  Stochastic  fepcess>  6  v  • 

A  natural  language,  such  as  English,  can  be  studi 
from  many  points  of  view — lexicography,  syntax*  semantics, 
history,  aesthetics,  etc.  The  only  properties  of  a  languag 
of  interest  in  cryptography  are  statistical  properties.  Wh 
are  the  frequencies  of  the  various  letters,  of  different  di 
(pairs  of -letters),  trigrams,  words,  phrases,  etc.?    What  i 


the  probability  that  a  given  word  occurs  in  a  certain  mossag 
The  "cleaning"  of  a  message  has  significance  only  in  its  in- 
fluence on  those  probabilities.    For  our  purposes  all  other 
properties  of  language  can  be  omitted.    We  consider  a  langur. 
therefore,  to  be  a  stochastic  {i.e.  a 'statistical)  process  w 
generates  a  sequence  of  symbols  according  to  some  system  of 
probabilities.    The  symbols  will  be  the  letters  of  the  langu 
together  with  punctuation,  spaces,  etc.,  if  these  occur. 

Conversely  any  stochastic  process  which  produces  a 
discrete  sequence  of 'symbols  will  be  said  to  be  a  language. 
This  will  include  such  cases  as:  ,  ,  , 

1.  •  Natural  written  languages  such  as  English,  German,  Chine 

S%    Continuous  information  sources  that  have  been  rendered 
discrete  by  some  quantizing  process,:.  Tor  example.,  the 
quantized  speech  from  a  PCM  transmitter,  or  a  quantized 
•television  signal*  *  .. 

3.  "Artificial"  languages,"  where  we  merely  defiae  abstract  1 
a  stochastic  process  which  generates  a  sequence  of  symbc 
The  following  are  examples  of  artificial  languages. 

(A)  Suppose  wo  have  5  letters  A,  B,  C,  D,  E  which  are 
chosen  each  with  probability  .2,  successive  choicer 
being  independent.    This  would  lead  to  a  sequence  c 
which  tho  following  is  a  typical  example. 

B  DCBCECCCADCBDDAAECEEA 
ABBDAEECACEE'BAEECBCEAD 

This  was  constructed  with  the  use  of  a  table  of  rar 
numbers,*  •.:'<• 

(B)  Using  the  same  5  letters  lot  the  probabilities  be 
.4,  .1,  .2,  .2,  .1  respectively,. with  successive 
choices  independent.-  A  typical  "text"  in  this 
language  is  thoni  .     '    ;1^fC>      '  '    ^ '.; 

""'    '  a  A  A  C  D  C  B  D  C  E  A  A  D  A  D  A  C  E  D  A  ' 

v    .     f  ;  J; 'v  i  A  P  CA  BE  D  A  D  D  CE;0  A  AAA  A  D 

■(C)  A  more  complicated  structure  is  obtained  "if  succesi 
letters  are  not  chosen"  independently  but  their  prot 
bilities  depend  on  preceding  lottors.    In  the  simpj 


*  Kendall  and  Smith,  "Tables  of  Random  Sampling  Numbers," 
Cambridge,  1939. 
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case  of  this  type  a  choice  depends  only  on  the 
preceding  letter  and  not  on  ones  before  that.  The 
statistical  structure  can  then  be  described  by  a 
set  of  transition  probabilities  p^j),  the  probabi" 

that  letter  i  is  followed  by  letter         The  indices 
i  and  j  range  over  all  the  letters  in  the  language 
A  second  equivalent  vrny  of  specifying  the  structur 
is  to  give  the  digran  probabilities  p(i,j),  the  re! 
tive  frequency  of  the  digram  1  j  in  the  language. 
The  letter  frequencies  pTi),  (the  probability  of 
letter  i),  tho  transition  probabilities  p^j)  and  1 

digram  probabilities  p(i,j)  are  related  by  the  foi: 
ing  formulas,,       ,  ~     "■• .  ~. 

pfi)  -3  p(j,,J)  -2  p(j,i)  ~  Z  p(jWlj'- 

' .  :.  t.J  ,,,  x y  .       j    ■  3  : 
;:         -  P(i)  %M  J^^^xl 2|J 
i  p1(ji  -|p(i)  -      p(i  j)  *  i  % 

As  a  specific  example  suppose  there  are  three  lettt 
A,  B,  C  with  the  probability  tables: 


PiU) 

A 

3 

B  C 

A 

0 

,e  .2 

i  B 

.5 

•5  0 

c ; 

,5 

.4  a 

A 
B 


P(i) 
9 

2? 

16 
£7 

a 

27 


A 

3 

B 

A 

0 

4 

IF 

i  B 

8 
27 

e 

27' 

1 

ST 

4 

135" 

A  typical  text  ^in,  this  language  is  the  following. 

A  B  B  ABA  B  A  B. A  B  A  B  A  B'B  B  ABB  B  B  B  A  B 
k  ;B  A  B  A  BAB  B  B  A  C  A  C  A  B  B  A  3  B  B  3  A  B  B 
A>  A  C  B  B  B  A  B  A      \.  " 


The  next  increase  in  complexity  would  involve  trigr 
frequencies  but  no  more*    The  choice  of  a  letter  wc 
depend  on  the  preceding  two  letters  but  not  on  the 
text  before  that  point.    A  set  of  trigram  frequonci 
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p(i,j,k)  or  equivalently  a  set  of  transition  prob: 
bilities  Pjj(k)  would  bo  required.    Continuing  in 

this  way  one  obtains  successively  more  complicate; 
stochastic  processes.    In  the  general  n-gram  case 
a  set  of  n-gram  probabilities  p(i^,  ig,  •  in) 

or  of  transition  probabilities  p,  ,  ^ 

11    H>  Vl 
is  required  to  specify  the  statistical  structure, 

(D)    Stochastio  processes  can  also  be  defined  which  prt 
duce  a  text  consisting  of  a  sequence  of  "words. " 
Suppose  there  are  5  letters  A,  B,  C,  D,  E  and  16 
"words"  in  the  language  with  associated  probabilii 

'  .10  A         .16  BEBE  -  .11  tJABED  -  3  .04  DEB 

'  .04  ADEB  •  .04  BED  .  .  .05  CEED  ,  »15  DEED 

'  .05  ADEE  •  .02  BEEP  -  3  .08  DAB  '     V  >•  01  EAB 

*:  .OX  BADD  •  .05  CA  *  .04  DAD"  v  ?  i  .05  EE  ^ 

Suppose  successive  "words"  are  cndseii  Independent: 
and  are  separated  by  a  space.    A  typical  message 
might  be: 

DAB  EE  A  BEBE  DEED  DEB  ADEE  ADEE  EE  DEB  BEBE  BEBE 
BEBE  ADEE  BED  DEED  DEED  CEED  ADEE  A  DEED  DEED  BEBI 
CASED  BEBE  BED  DAB  DEED  ADEB 

If  all  the  words  are  of  finite  length  this  process 
is  equivalent  to  one  of  the  preceding  type,  but  t: 
description  may  be  simpler  in  terms  of  the  word 
structure  and  probabilities.  We  may  al3o  general: 
here  and  introduce  transition  probabilities  betwee 
words,  etc.,  ^       I,  - 

•  .>.  "  i 

These  artificial  languages  are  useful  in  construe 
simple  problems  and  examples  to  illustrate  various  posslbil 
V£e  can  also  approximate  to  a  natural  language  by_  moans  of  c 
series  of  simple  artificial  languages*  The  aero  order  appr 
mation  is  obtained  by  choosing  all  letters  with  the  seme  pr 
bility  and  Independently.  The  first  order  approximation  is 
obtained  by  choosing;  successive  letters  independently  but  e 
letter  having  the  same  probability  that,  it  does  in  the  natu 
language,.  .Thus  in  the  first  order  approximation  to  English 
is  chosen  with  probability  .12  (its  frequency  in.  normal  Eng 
and  W  with  probability  .02^'but  there  is  no  influence  betwe 
adjacent  letters  and  no  tendency  to  form  the  preferred  digr 
such  as.TH,  .ED,  etc.  In  the  second  order  approximation  dig 
structure  is  introduced. . 'After  a  letter  is  chosen,  the  nex 


one  is  chosen  in  accordance  with  the  frequencies  with  which 
the  various  letters  follow  the  first  one.    This  requires  a 
table  of  digram  frequencies  p^(jj,  the  frequency  with  which 

letter  j  follows  letter  i.  In  the  third  order  approximatio: 
trigram  structure  is  introduced.  Each  letter  is  chosen  wit 
probabilities  which  depend  on  the  preceding  two  letters. 

3.    The  Series  of  Approximations  to  English 

To  give  a  visual  idea  of  how  this  series  of  proce; 
approaches  a  language,  typical  sequences  in  the  approximate 
to  English  have  been  constructed  and  are  given  below*  In  a: 
cases  wo  have  assumed  a  27  symbol  "alphabet t ho  26  letter; 
and  a  space.      -        "  ,., 

1.  Zero  order  approximation  {symbols  independent  and  equ: 

probable);-'.-,  *  •'•^./,.         '  '  '■,         \.  ."  t 

XFCKL  RXKHRJFF JUJ  ZLPWCFWKErW  FFJEYVKCQSGXYB 
QPAAMKBZAACIBZLHJQD  • 

2.  First  order  approximation  (symbols  independent  but  wit 
frequencies  of  English  text).  y 

OCRO  HXI  RGWR  NMIELWIS  EU  LL  NBNESEBYA  TH  EEI  ALHENHT. 
\     OOBTTVA  NAH  BRL 

3.  Second  order  approximation  (digram  structure  as  in  En( 

OK  IE  ANTSOUTINYS  ARE  T  INC  TORE  ST  BE  S  DEAMY  ACHIN  D 
ILCNASIVE  TUCOOVSE  AT  TEASONARE  FUSQ  TlZIN  ANDY  TOBE 
SEACE  CTISBE  " 

4.  Third  order  approximation  (trigram  struoture  as  in  Eng 

IN  NO  1ST  IAT  WHEY  CRATICT  FROURE  BIRS  GROCID  PON DEN  OL 
OF  DEHONSTURES  OF  THE  REPTAGIN  jIS  REGOACTIONA  OF  CRE 

5m  1st  Order  Word  Approximation."  Rather  than  continue  wi 
.  .  •  tetragram,  n-gram  structure,  it  is  easier  and  bett 

to  jump  at  th^a  point  to  ..word  units.    Here  words  are 
chosen  independently  but  with  their  appropriate  fro que 

REPRESENTING  AND  SPEEDILY  IS  AN  GOOD  APT  OR  COME  CAN 

DIFFERENT  NATURAL  HERE  HE  THE  A  IN  CAME  THE  TO  OF  TO 

EXPERT  GRAY  COME  TO  FURNISHES  THE  LINE  MESSAGE  HAD  BE 

THESE. - 


6.    End  Order  Word  Approximation.    The  word  transition 
probabilities  are  correct  but  no  further  structure  is 
included, 

THE  HEAD  AND  IN  FRONTAL  ATTACK  ON  AN  ENGLISH  WRITER 
THAT  THE  CHARACTER  OF  THIS  POINT  IS  THEREFORE  ANOTHER 
METHOD  FOR  THE  LETTERS  THAT  THE  TIME  OF  WHO  EVER  TOLL 
THE  PROBLEM  FOR  AN  UNEXPECTED 

The  resemblance  to  ordinary  English  text  increase 
quite  noticeably  at  each  of  the  above  steps*    Note  that  the 
samples  have  reasonably  good  structure  out  to  about  twice  t 
range  that  is  taken  into  account. in  their  construction*  Th 
in  (3)  the  statistical  process  Insures  reasonable  text  for 
letter  sequence,  but  four-letter  sequences  from  the  sample 
usually  bo  fitted  Into -good  sentences,.  .  In  (6)  sequences  of 
or  more  words  can  easily  be  placed  in  sentences  without  unu 
or  strained  constructions >   Tfio  particular  sequence  of  ten 
words  "attack  on  att- English  writer  that  .the  charaoter  of  th 
Is  not.  at  all  unreasonably.  *»^***         •  '--       ^  ^ 

The  first  two  samples  were  constructed  by  the  use 
a  book  of  random  numbers  in  conjunction  for  (2)  with  a  tabl 
of  letter  frequencies.  This  method  might  have  been  continu 
for  (5),  (4),  and  (5),  since  digram,  trigram,  and  word  freq 
tables  ore  available,  but  a  simpler  equivalent  method  was  u 
To  construct  (3)  for  example  ono  opens  a  book  at  random  and 
selects  a  letter  at  random  on  the  page.  This  letter  is  re- 
corded* The  book  is  then  opened  to  another  page  and  one  re 
until  this  letter  is  encountered.  The  succeeding  letter  is 
then  recorded.  Turning  to  anothor  page  this  second  letter  : 
searched  for  and  the  succeeding  letter  recorded,  etc*  A  si: 
process  was  used  for  (4),  (5),  and  (6).  It  would  be  lnterc 
if  further  approximations  could  bo  constructed,  but  the  lab 
involved  becomes  enormous  at  the  next  stage*  •  , 

The  stochastic  process  6  is  already  sufficiently  c 
to  English  for  many  cryptographic  purposes  since  most  crypt- 
analysis  is  based  on  "local"  structure  of  not  more  than  two 
three  words  in  length.'  .  '  ~ 

.  -    ■  .  :;    s    ;       •  . 

4*.  Graphical  Representation  of  a  Markoff  Process 

Stochastic  processes  of  tho  type  described  above  r 
known  mathematically  as  discrete  Karkof f  processes  and  have 
been  extensively  studied  in  the  literature**    $ho  general  ci 

ysi-:  .'A   

*  For  a  detailed  treatment  see  M.  Frochet,  "Methods  des  fon 
arbitraires.  Theorie  des  enSnements  en  chaine  dans  le  ca: 
d'un  nombro  fini  d'etats  possibles."  Paris,  Gauthier-Vill 
1938.  ~ 
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can  be  described  as  follows.  There  exist  a  finite  number  c 
possible  "states"  of  a  system;  S1,  Sg,  . ..,  Sn»    In  additic 

there  is  a  set  of  transition  probabilities;  q^j)  the  probe. 

bility  that  if  the  system  is  in  state  S±  it  will  next  go  tc 

state  Sy    To  make  this  Markoff  process  into  a  language  ger. 

tor  we  need  only  assume  that  a  letter  is  produced  for  each 
transition  from  one  state  to  another*    The  states  will  corr 
spond  to  the  "residue  of  influence"  from  preceding  letters. 

The  situation  can  be  represented  graphically  as  s 
in  Figs.  1,  2,  3  and  4.  .  The  "states"  are  the  junction  poir. 
in  the  graph  and  the  probabilities  and  letters  produced  for 
transition  are  given  beside  the  corresponding  line.  Fig.  1 
for  the  example  B  in  Section  2,  while  Fig,  2  corresponds  tc 
example  C.  In  Fig.  1  there"  ijs  only  ono  stato  since  success 
letters  ere  independent*  In  Fig»  2  there  are  as  many  state 
as  letters.    If  a  trlgram  example  wero  constructed  there  wc 

be  at  most  n  states  corresponding  to  the  possible  pairs  of 
letters  preceding  the  one  being  choson.  Figs.  3  and  4  shov: 
graphs  for  the  case  of  word  structure  in  example  D.  In  the 
S  corresponds  to  the  "space"  symbol.  In  Fig.  3  each  word  h 
a  separate  chain  of  branches  from  the  left  to  the  right  juii 
point,  while  in  Fig.  4  the  branches  have  been  combined,  sic 
fying  the  graph. 

5.    Puro  and  Mixed  Languages 

As  we  have  indicated  above  a  "language"  for  our  p 
poses  can  be  considered  to  bo  generated  by  a  Markoff  proces 
Among  the  possible  discrete  Markoff  processes  there  is  a  gr 
with  special  properties  of  significance  in  cryptographic  wc 
This  special  class  consists  of  the  "ergodic"  processes  and 
shall  call  the  corresponding  languages  "pure  languages."  A 
though  a  rigorous  definition  of  an  ergodic  process  is  somev; 
involved,  the  general  idea  is  simple.  In  an  ergodic  proces 
every  sequence  produced  by  the  process  is  the  same  in  stati. 
tical  properties.  Thus  the  letter  frequencies >  digram  fre- 
quencies, etc.",-  obtained  from  particular  sequences  will,  as 
lengths  of  the  sequences  increases,  approach  definite  limit, 
independent  of  the  particular  sequence.  Actually  this  is  n 
true  of  every  sequence  but  the  sot  for  which  it  is  false  ha; 
probability  zoto.  Roughly  the  ergodic  property  means,  stati; 
tical  homogeneity,  - 

.    «  -  •  •••  •  /  -  --iV-r  , 

v  ('  -        "  .       .  . 

All  the  examples  of  artificial  languages  given  ab 
are  pure,  the  corresponding  Markoff  process  being  ergodic. 
This  property  is  related  to  the  structure  of  the  correspond 
graph.    If  tho  graph  has  two  properties  the  language  it  gen 
will  bo  pure.    These  properties  ore: 


1.  The  graph  cannot  be  divided  into  two  parts  A  and  B  su 
that  it  is  impossible  to  go  from  junction  points  in  r. 
A  to  junction  points  in  part  B  along  lines  of  the  gra 
in  the  direction  of  arrows  and  also  impossible  to  go 
from  nodes  in  part  B  to  nodes  in  part  A, 

2.  A  olosed  series  of  lines  in  the  graph  with  all  arrows 
on  the  lines  pointing  in  the  same  orientation  will  be 
called  a  "circuit."  The  "length"  of  a  circuit  is  the 
number  of  lines  in  it.  Thus  in  Fig.  4  the  series  BEE 
is  a  circuit  of  length  4.  The  second  property  requir 
is  that  the  greatest  common  divisor  of  the  lengths  of 
all  circuits  in /the  graph  be  one,   :  \  - 

If  the  first  condition  is  satisfied  but  the  secon 
one ( violated  by  haying  the  greatest  common  divisor  equal  to 
d  >  1,  the  sequences  have  a  certain  type  of  periodic  struct 
The  various  sequences  fall  into  d  different  classes  which  a: 
statistically  the  same  apart  from  a  shift  of  the  origin  (i.. 
which  letter  in  the  sequence  is  called  letter  1)  V»  By  a  shi: 
of  from  0  up  to  d  -  1  any  sequence  can  be  made  statisticall 
equivalent  to  any  other.  A  simple  example  with  d  =  2  is  th- 
following.  There  are  three  possible  letters  a.  b,  c.  Lettc 
a  is  followed  with  cither  b  or  c  with  probabilities  ±  and  £ 

3  3* 

respectively.  Either  b  or  o  is  always  followed  by  letter  a 
Thus  a  typical,  sequence  is 

abncacacabacababacac.  . 
This  typo  of  situation  is  not  of  much  importance  for  our  woi 

If  the  first  condition  is  violated  the  graph  may  1 
"separated"  into  a  set  of  subgraphs  each  of  which  satisfies 
first  condition.  We  will  assume  that  the  second  condition  2 
"  also  satisfied  for.  each  subgraph.  We  have  in  this  case  what 
may  be  called  a  ''mixed"  language  made  up  of  a  number  of  pure 
components.  .  The  components  correspond  to  the  various  subgrc 
If  **1»         ^3*         D:ce  ^ne  component  languages  we  may  write 

>  t  -  p^  ♦  p^2  *  p3%  ♦  *y->f\ 

where  pA  is  the  a  priori  probability  of  the  component  langut 


•  ■  -  j   . 

Physically  the  situation  represented  is  this.  The 
are  several  different  languages  1^,  1^,  Lj,  which  are  e 

of  homogeneous  statistical  structure  (i.o.,  they  are  pure 
languages).    We  do  not  know  a  priori  which  is  to  be  used,  bu 
once  the  sequence  starts  in  a  given  pure  component        it  cor. 
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indefinitely  according  to  the  statistical  structure  of  that 
component.    Wo  do  havo,  however,  a  set  of  a  priori  probabilities 
for  tho  various  components,  p^,  pg,  . 

As  an  example  one  may  take  two  of  the  artificial 
languages  defined  above  and  assume  p^  =  .2  and  p2  »  .8.  A 

sequence  from  tho  mixed  language 

L  »  .2  1^  +  ,.8  Lg 

would  be  obtained  by  choosing  first       or  Lg  with  probabilities 

.2  and  .8  and  aftor  this  choice  generating  a  sequence  from 
whichever  was  chosen*  - 

A  natural  language,  such  as  English  or  German,  is 
not,  of  course,  pure.    Different  kinds  of  text,  literary, 
newspaper ,  technical  or  military,  display  consistently  differ- 
ent types  of  structure.    Those  differences  are  small,  however, 
in  comparison  with  the  differences -between  different  natural 
languages.    If  only  local  structure— letter,  -digram  and  trigram 
frequencies,  for  instance — is  of  much  importance,  it  is  reason- 
able to  consider  "normal  English"  to  be  nearly  pure. 

6.    Information  Rate  and  Redundancy  of  a  Language 

Suppose  we  have  a  pure  language  L  produced  by  a  given 
Markoff  process.    Associated  with  the  language  there  are  certain 
parameters  which  are  of  significance  in  questions  of  trans- 
forming the  language  and  in  cryptography.    The  most  important 
of  these  is  what  we  will  call  the  "information  rate"  R  for  the 
language.    It  measures  the  rate  at  which  the  Markoff  process 
"generates  information,"  as  determined  by  the  measurement  of 
the  amount  of  choice  available  on  tho  average  per  letter  of 
text  that  is  produced.    In  Section  1  we  deflnod  the  amount  of 
choice  when  there  ore  various  possibilities  with  probabilities 
Pl»  P2i  *V,  Pn  as 

H  ■  ■  2       log  Pi  • 

In  a  Markoff  process  with  a  number  of  different  ^states"  there 
will  be  a  choice  value  ft^  for  each  of  these  states  and  a  proba- 
bility of  being  in  each  of  the  states  (or  a  frequency  with  which 
this  state  occurs)*    If  this  relative  frequency  for  state  i  is 
P*,  the  average  amount  of  choico  Is 

R  -  Z  Pi  ^ 

summed  over  all  the  states.    This  is  tho  definition  of  the 


information  rate  for  the  language.  If  p^(j)  is  the  probability 
of  producing  letter  J  when  in  state  i  we  have 

^  -2  Pi(j)  log  Pi(jJ 

the  sun  being  over  all  tho  letters  in  the  language.  Thus 

R  -   Z   Pt  Pitj)  log  ptU) 

Tho  infornation  rate  R  has  the  units  of  alternatives 
(or  digits)  per  letter  sinoe  it  neasures  the  average  amount  of 
choice  por  letter  of  text  that  is  produced, 

.  A  second  parameter  of  importance  is.  the  "maximum  rate" 
RQ  for  the  source.    This  is  defined  simply  as  the  logarithm  of 

the  number  of  different  letters  in  the  language.    RQ  is  also 

measured  in  alternatives  or  digits  per  letter.    If *  successive 
letters  are  chosen  independently  and  each  letter  is  equally 
likely  RQ  «  R.    Otherwise  we  have  R  <  RQ. 

R  and  RQ  are  actually  two  limiting  cases  of  informa- 
tion rates  for  the  language.    R    may  be  said  to  be  the  rate 

when  no  statistical  structure  is  taken  into  consideration  and 
R  is  the  rate  when  all  the  structure  is  taken  into  account. 
Between  these  there  is  an  infinite  series  of  rates  R*f-  Rg, 

RQ,  •••  which  take  some  of  the  statistical  structure  into 

account.    R^  takes  the  letter  frequencies  into  account  and  is 

defined  by 

%  «  L  p(i)  log  p(i) 

..  -  * 

where  p(i)  is  the  probability  of  letter  i.    R2  takes  digram 
structure  into  account  and  is  def inod  by 

R2r-2  p(I)'p1(J)  log  Pl(J) 

where  the  p(i)  are  letter  probabilities  and  pjJJ)  the  ^transition 

probabilities,  i»e.,  tho  probability  of  letter  i  being  followed 
by  letter  J;    In  general  we  define 

*n  "Z  P<*i»  h*  W  Piifg    V  d(in) 

lOg    P±     4  *  (i_) 

X\H        *n-l  n 


where  tho  sum  is  on  all  indices  i, ,  •       i_  and  p<     •  ••  . 

1  ^         .'I  1n-l 

is  the  probability  of  (n-1)  gram  i-^  •*»  i^^  with 

pi  ^n^  tho  I^^abillty  of  this  n-1)  gram  being  folio; 

1  n-1 

by  letter  i^.  ^  may  be  called  tho  n-gram  information  rate  fc 
the  language.    It  can  be  shown  that 

.  Ro>Rl>R2  ^  Roo  "R 

These  rates  determine  how  much  a  language /can  be  "compressed" 
in  length  by  a  suitable  oncoding  process*    A  language  with 
maximum  rate  Rq  and  rate  R  can  be  transformed  in  such  a  way 
that  a  sequence  of  letters  N  letters  long  is  transformed  into 
a  sequonco  of  letters  only  N*  letters  long  where 


IV  RA  «  N  R 


(This  is  approximate  and  only  exactly  true  in  the'limit  as 
N  -+  oo  .)    Thus  tho  information  is  "compressed"  in  th6  ratio 

R 

This  is  the  greatest  compression  ratio  possible.    It  makes  use 
of  all  the  statistical  structure  of  the  language.    If  only 
n-gram  structure  is  made  use  of,  a  compression  ratio 


is  the  best  possible. 


The  compression  obtained  in  this  way  is  only  a 
statistical  gain.    Some  infrequent  sequences  are  encoded  into 
much  longer  sequences  while  the  more  probable  ones  go  into 
shorter  sequences  so  that  on  the  average  the  length  is  de- 
creased.   It  is  the  type  of  compression  obtained  in  telegraphy 
by  using  the  shortest  telegraph  symbol,  a  single  dot,  for  the 
most  froquont  letter  E,  while  uncommon  letters  Q,  Z,  etc,  arc 
encoded  into  longer  telograph  symbols.    An  average  reduction 
in  time  of  transmission  is  obtained  but  there  are  possible 
soquencos,  e.g.,  Q  Q  Q  *  »  t,  which  require  much  longer* 
_»     ■   ■  • 

Performing  'a  transformation  on  a  language  L  which 
compresses  as  much  as  possiblo  will  be  called  reducing  t  to 
a  "normal"  form.    When  this  has  been  done  it  can  be  shown 
that  all  letters  in  the  output  are  equally  likely  and  inde- 
pendent.   Actually  to  realize  this  transformation  would  usuall 
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require  an  infinitely  complex  machine,  but  we  can  always  ap- 
proximate it  as  closely  as  desired,  with  a  machine  of  finite 
complexity. 


Tho  quantity 


D  =  RQ  -  R 


will  bo  called  the  redundancy  rate  of  the  language.     It  meas 
the  excess  information  that  is  sent  if  sequences  in  the  lang 
arc  transmitted  in  their  original  form  (without  compression 
reduction  to  normal  form).    Correspondingly  thero  is  a  whole 
series  of  redundancy  rates: 


Do  -  Ro  -  V 
Dp  -  R,   -  R? 

ej  x  m 


D    =  R    -  R 
n       o  n 


D    =  Rc  -  R 

is  the  redundancy  rats  due  to  n-gram  structure  in  the 
language . 

The  redundancy  D  can  also  be  said  to  measure  the 
amount  of  statistical  structure  in  the  language.    If  the  se- 
quence is  purely  random  D  =  0  whilo  at  the  other  extreme  if 
each  letter  is  completely  determined  by  preceding  letters  wit 
no  freedom  of  choice,  D  has  its  maximum"  possible  value  RQ.  3 
is  sometimos  convenient  to  use  the  "relative"  redundancy  D/Rc 
which  must  lie  between  0  and  10C#.    •  ; 

V 

If  we  hnvo  a  source  of  rate  R,  maximum  rate  R  (bot 
in  digits  per  letter)  and  consider  the  possible  sequences  of 
letters  these  fall  into  two  groups  for  N  large.     One  group  ol 
"high  probability"  sequences  contains  about 


10™ 


zz 


sequencGS  (where  we  have  assumed  R  measured  in  digits  per  letter). 
All  of  those  have  substantially  the  same  logarithmic  .probability. 

The  remainder  of  the  total  of  10*°*  possible  sequences  are  of 
very  small  probability.    In  fact  thoir  total  probability  ap- 
proaches zero  as  N  increases .    The  logarithm  of  the  probability 
of  an  individual  sequence  in  the  high  probability  group  is  thus 
about  -RN.    In  a  procise  statement  of  these  results  we  must  allow 
a  certain  fuzzincss  in  R,  i.e.,  replace  R  by  R  ±  e  whore  e  -*  0 
as  N  -*  oo  « 
. 

Reduction  of  a  language  to  normal  form  is  performed 
by  properly  matching  tho  probabilities  of  sequences  to  the 
length  of  the  corresponding  sequences  in  the  normal  form.  The 
"high  probability"  sequences  are  translated  into  short  sequences 
and  tho  remainder  into  longer  sequences. 

_  An  example  will  clarify  tho  results  we  have  given. 

Let  the  language  contain  4  lotters  A,  B,  C,  D.  In  a  soquenoe 
successive  lotters  are  chosen  independently,  the  four  letters 

having  probabilities  ^,  ^,  |,  £,  respectively.    Vie  have 
rq  m  iog2  4-2  alternatives/letter 

and 

1         11         12  1 
Rl  *  R2  "  %  "         "  R  "  "  (2  log  t  +  4  loe  4  + 8  los  8"} 

■ 

*  I  +  I  *  I  **  4  alternatives/letter 

By  a  suitable  transformation  the  average  length  of  sequences 

can  bo  reduced  by  tho  factor  ^/2  -  7/8.    A  transformation  to  do 

it  is  the  following.    First  wo  translate  into  a  sequence  of 
binary  digits  (0  or  1 )  by  the  following  table 

A  0 

B  10 

-                       C  110 

D  111 

After  this  pairs  of  the  binary  digits  aro  translated  into  the  • 
original  alphabot  as  follows 

00  '  A1 

01  B» 

10  C» 

11  D« 
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For  a  typical  scquonco  this  works  out  as  shown  below: 


AB  CABAC  BBDAA  D  A  D  A 
0  10  110  0  10  0  110  10  10  111  0    0    111    0    111  0 

Regrouping  and  translation  back  into  letters: 

01  01  10  01  00. 11  01  01  01  11  00  11  10  11  10 
.     B«  B»  C«  B»  A»  V  B'  B«  B»  D«  A*  D«  C»  D'  C 

In  this  case  there  are  16  letters  in  the  original  and  15  in 
final  text.  Thus  due  to  the  snail  redundancy  and  the  short 
of  the  text  only  part  of  tho  saving  is;  evident*  .  In  a  long 

hoivever  the  full  reduotion -of  g  would  appear*  ,  This  nay  be 

verified  directly  in  this  cose.  In  a  long  text  of  N  letter 
each  letter  will  appear  with  about  its. appropriate*  *requenc 
Thus  the  nuriber  of  binary  digits  will  be  about 

N[|  •  l  +  J-2+|«3+^-3]  ■  J  N 

since  each  A  gives  one  binary  digit,  each  B  gives  two,  etc. 
nuriber  of  letters  in  the  final  text  is  half  this  since  each 
pair  of  binary  digits  goes  into  ono  letter.    Thus  the  re due 

is  by  a  factor  Z  . 

0 

It  is  also  easy  to  seo  in  this  case  that  the  bina 
digits  are  equally  likely  and  independent,  and  fron  this  th 
tho  final  text  letters  are  also* 

This  situation  is  nore  coriplicated  for  nixed  long 
and  we  shall  not  enter  into  it  here*  Wo  nay  note,  however, 
that  if 

L  -jpfo*  •'»••  ♦  PnIfc  : 

whore  1^  is  pure  with  rate  R^f  then  the  long  sequences  of 

fall  into  (n+1)  groups^  The  first  n  groups  correspond  to  t: 
pure  conpononts.    Thpse  in  gr  oup  1  nunber  about  - 


and  have  logarnithic  probability  about 
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Tho  last  group  contains  all  other  sequences  and  has  a  snail 
total  probability* 

7,    Redundancy  Characteristic  of  a  Language 

The  form  of  the  curve  D(N)  as  a  function  of  N  na; 
called  the  redundancy  characteristic  of  the  language.  In  : 
rough  way  it  describes  the  way  in  which  the  redundancy  appt 
In  Fig.  5  several  types  of  characteristics  are  shown,  all  i 
the  same  final  redundancy.  The  way  in  which  this  approach 
is  of  importance  in  cryptography.  For  languages  which  reac 
final  redundancy  at  one  or  two  letters  (Curves  1  and  2)  one 
of  cipher  (ideal  ciphers)  can  be  used.  For  those  which  rer 
near  zero  out  to  fairly  large  N  (like  Curve  5)  another  type 
appropriate.  Natural  languages  are  apt  to  show  a  character 
more  like  3,  and  this  makes  them  difficult  to  encipher  witi 
security  by  simple  means.      ■  . 

-  Examples ; 

1.  A  language  in  which  successive  letters  are  independer 
but  with  different  probabilities  has  a  characteristic 
Type  1. 

2.  Consider  a  language  constructed  as  follows.  First  sc 
268  different  sequences  of  letters,  each  16  letters  1 
from  tho  2616  possible  sequences  of  this  length.  Th: 
should  be  a  random  selection.  The  16-letter  sequence 
chosen  aro  the  "words"  of  tho  language.  Messages  arc 
random  sequences  of  those  "words."  Such  a  language  1 
a  characteristic  like  the  Curve  5, 

3.  A  language  with  digram  structure  only,  such  as  Exampl 
in  Section  2  above,  has  a  characteristic  of  the  Type 
Fig.  5,  reaching  its  final  value  at  N  =  2. 

4.  English  has  the  characteristic  3  in  Fig.  5. 

■ 

The  redundancy  characteristic  describes  how  the 
structure  in  the  language  is  spread  out.    If  the  structure 
localized,  tho  curve  rises  rapidly  to  its  final  value.  If 
there  are 'long  range  influences  the  asymptotic  value  is  ap- 
proached more,  slowly.    If  the  structure  is  "locally  random" 
the  curve  will  romain  near  zoro  for  small  N. 


8.    Secrecy  Systems 

Before  we  can  apply  any  mathematical  analysis  to 
secrecy  systems,  it  is  necessary  to  idealize  the  situation 
suitably,  and  to  define  in  a  mathematically  acceptable  way 
what  v«e  shall  mean  by  a  secrecy  system.    A  "schematic"  -diagram 
of  a  general  secrecy  system  is  shown  in  Fig.  6.    At  the  trans- 
mitting end  there  are  two  information  sources — a  message  source 
and  e  key  source.    The  key  source  produces  a  particular  key  from 
among  those  which  are  possible  in  the  system.    This  key  is  trans- 
mitted by  some  means,  supposedly  not  intercept ible ,  e.g.  by  mes- 
senger, to  the -receiving  end.    The  message  source  produces  a 
messnge  (the  "clear")  which  is  enciphered,  end  the  resulting 
cryptogram  sent  to  the  receiving  end  by  a  possibly  interceptible 
means,  for  example  radio.    At  the  receiving  end  the  cryptogram 
and  key  are  combined  in  the  decipherer  to  recover  the  message. 

Evidently  the  encipherer  performs  a  functional  opera- 
tion.   If  M  is  the  message,  K  the  key,  and  E  the  enciphered  mes- 
sage, or  cryptogrrm,  we  have 

I  -  f(M,  K) 

i.e.  E  is  r  function  of  M  end  $«    We  prefer  to  think  of  this, 
however,  not  as  n  function  of  two  variables  but  as  n  (one  para- 
meter) family  of  operations  or  trcnsforma tions ,  and  we  write  it 

E  -  T,M.  . 


The  transformation  T,  applied  to  message  M  produces  cryptogram  E. 
The  index  i  corresponds  to  the  particular  key  being  used.  If 
there  are  m  possible  keys  there  will  be  m  transforations  in  the 
family         Tg,  ......  Tffi, 

At  the  receiving  end  it  must  be  possible  to  recover 
M ,  knowing  E  and  X.    Thus  the  transform  tions  in  the  family 
must  have  unique  inverses 

M  -  Tf 1  E 

at  any  rate  this  inverse  must  exist  uniquely  for  every  E  which 
can  be  obtained  from  an  M  with  key  i. 

The  key  souroe  can  be  thought  of  as  a  "probability 
machine,"  something  which  chooses  from  the  possible  keys  ac- 
cording 'to  a  system  of  probabilities.    Mathematically  then,  the 
keys  (or  the  parrmeter  of  the  family  of  transformations)  belong 
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to  q  probability  or  measure  spree.    Hence  we  r-rrive  rt  the 
definition: 

A  secrecy  system  is  o  family  of  uniquely  reversible 
transformations  T,  of  r  message  spree  ^  into  0  cryptogam 
spr.ce.Tl_,,  the  parameter  i  belonging  to      a  probability  spr.ee  CL.. 
Conversely  any  set  of  entities  of  this  type  will  be  called  a  * 
"secrecy  system."   .  . 

The  system  can  be  visualized  mechanically  as  a 
machine  with  one  or  more  controls  on  it-  '  A  sequence  of  letters, 
the  message,  is  fed  into  the  input  of  the  machine  and  a  second 
series  emerges  at  the  output.    The  particular  setting  of  the 
controls  corresponds  to  the  particular  key  being  used.  Some 
method  must  be  prescribed  for  choosing  the  key  from  all  the 
possible  ones* 

To  make  the  problem  mathematically  tractable  we  shall 
assume  that  fthe  enemy  knows  the  system  being  used*    That  is,  he 
knows  the  family  of  transformations  T,,  and  the  probabilities 
of  choosing  verious  keys* 

One  might  object  to  this  as  being  unrealistic,  in  that 
the  cryptanalyst  often  does  not  know  whet  system  was  used  or  the 
probabilities  of  vrrious  keys.    There  are  two  answers  to  this 
objection. 

1.  The  resumption  is  rcturlly  the  one  ordinarily  used 
in  cryptogr-phic  studies.    It  is  pessimistic  and 
hence  s-:fe,  but  in  the  long  run  realistic  (particu- 
larly in  military  work),  since  one  must  expect  his 
system  to  be  found  out  eventually  through  espionage, 
captured  equipment,  prisoners,  etc.    Thus,  even  when 
an  entirely  new  system  is  devised,  so  thot  the  enemy 
crnnot  rssign  rny  a_  priori  probability  to  it  without 
discovering  it  himself,  one  must  still  live  with  the 
expectation  of  his  eventual  knowledge,  • 

. 

2.  The  restriction  Is  much  weeker  thrn  appears  at  first, 
due  to  our  broad  definition  of  what  constitutes  the 
system.    Suppose  a  cryptographer  intercepts  a  message 
and  does  not  know  whether  a  substitution,  transposi- 
tion, or  Vigenere  type  cipher  was  used*    He  can  con- 
sider this'  as  being  enciphered  by  e  system  in  which 
part  of  the  key  la  the,  specification  of  which  of  these 
types  was  used,  the  next  part  being  the  particular 
key  for  that  type.    These  three  different  possibil- 
ities are  assigned  probabilities  according  to  his 
best  guesses  of  the  a  priori  probrbilit ies  of  the  en- 
cipherer using  the  respective  types  of  cipher. 
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A  second  possible  objection  to  our  definition  of 
secrecy  systems  is  that  no  account  is  taken  of  the  common 
practice  of  inserting  nulls  in  a  message  and  the  use  of  mu 
tiple  substitutes.    Thus  there  is  not  a  unique  E  ■  T,  M,  t 
actually  the  encipherer  can  choose  at  will  among  a  number 
different  E's  for  the  same  message  and  key.    This -situatic 
could  be  handled,  but  would  only  add  complexity  at  the  pre 
stage,  without  altering  any  of  the  basic  results.    To  defi 
the  more  general  secrecy  system,  one  would  add  a  second  pa 
meter  to  the  transformations  T,,  which  corresponds  to  the 
various  choices  of  cryptograms  corresponding  to  a  given  me 
sage  and  key.    It  is  possible,  but  not  always  desirable,  t 
consider  this  second  parameter  as  part  of  the  key,  since  i 
does  not  need  to  be  transmitted  to  the  receiving  point. 

We  elsO  assume  that  the  enemy  is  in  possession  o 
measure  in  the  space  0M,  the  a  priori  probabilities  of  var 
messages.  The  same  object ion"~and  essentially  tho  same  ans 
might  be  given  to  this  assumption  as  to  his  knowledge  of  t 
transformations  T*.  This  measure,  however,  we  do  not  cons 
rs  part  of  the  secrecy  system  for  reasons  which  wITl  apper 
later.  The  secrecy  system  whose  transformations  are  T.  wi 
be  denoted  by  T  and  this  concept  includes  the  space  or. 
which  T  operates  (without  its  measure ),  the  trans formation 
r-nd  the  spaces  Ojr  and  "i^,,  the  former  with  its  probabili 

measure. 

If  the  messages  are  produced  by  ?  M-rkoff  proce? 
of  the  type  described  previously,  the  probabilities  of  vrx 
messages  are  determined  by  the  structure  of  the  M^rkoff  pr 
For  the  present,  however,  we  wish  to  t^ike  a  more  general  t 
of  the  situation  rnd  regard  the  messages  as  merely  an  abst 
set  of  entities  with  associated^. probabilities ,  not  necess' 
composed  of  a  sequence  of  letters  and  not  necessarily  prod 
by  a  M^rkoff  process. 

It  should,  be  emphasized  that  throughout  tne  pape 
secrecy  system  means  not  one  but  a  set  of  many  transformat 
After  the  key  is  chosen  only  one  of  these  transformations 
used  and  we  might  be  led  to  define  a  secrecy  system  as  a  s 
transformation  on  a  language.*  The  enemy,  however,  does  r. 
know  what  key  was  chosen  and  the  "might  have  been"  keys  ar 
important  for  him  as  the  actual  one*  Indeed  it  is  only  tfc 
exi stance  of  these  other  possibilities  that  gives  the  syst 

*A.  A*  Albert  in  a  paper  presented  at  a  Manhattan,  Kansas, 
meeting  of  the  American  Mathematical  Society  (Nov.  22,  If 

•  entitled  "Some  Mathematical  Aspeots  of  Cryptography has 
defined  a  ciphering  system  in  this  way.  With  this  limite 
definition  about  all  one  can  do  is  to  describe  and  class; 
from  the  mathematical  point  of  view  various  types  of  trar 
formntions. 
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any  secrecy.'  Since  the  secrecy  is  our  primary  interest, 
are  forced  to  this  rather  elaborate  concept  of  a  secrecy 
system.    This  type  of  situation  where  possibilities  are  t 
important  as  actualities  is  almost  the  rule  in  games  of 
strategy.    The  course  of  a  chess  game  is  largely  control! 
by  threats  which  are  not  carried  out.    See  also  the  "vir: 
existence"  of  unrealized  imputations  "in  von  Neumann's  the 
of  games. 

There  are  a  number  of  difficult  epistemologica 1 
questions  connected  with  the  theory  of  secrecy,  or  in  fac 
with  any  theory  which  involves  questions  of  probability 
(particularly  a  priori  probabilities.  Bayes*  theorem,  etc 
when  applied  to  a  physical  situation.    Treated  abstractly 
probability  theory  can  be  put  on  a  rigorous  logical  basis 
with  the  modern  measure  theory  approach**    As  applied  to 
reality,  however,  especially  when  "subjective*  probabilit 
and  unrepec table  experiments  are  concerned,  there  are  mar. 
questions  of  logical  validity.    For  example  in  the  appror 
to  secrecy  made  here,  a  priori  probabilities  of  various  k 
are  assumed  known  by  tEe  enemy  cryptographer — bow  can  one 
determine  operationally  if  his  estimates  are  correct,  on 
basis  of  his  knowledge  of  the  situation? 

It  may  happen  thrt  the  keys  are  chosen  by  the 
cipherer  according  to  one  system  of  probabilities,  i.e.  c 
measure  in  the  key  space  0„  nnd  that  the  enemy  cryptanaly 
estimates  a  second  different  system  of  probabilities  fl£  i 
this  space  which  ere  entirely  reasonable  in  the  light  e 
his  knowledge  of  the  situation —  which  is  correct?      I  be 
lieve  that  both  a.re  correct.'    The  calculation  besed  on  Clj, 
leads  to  the  solution  when  the  enemy  knows  just  how  the 
keys  pre  chosen  r  nd  the  solution  .based  on  ^  leads  to  sol 
tions  which  are  correct  for  a  situation  agreeing  with  the 
enemy's  knowledge  of  the  actual  situation.    It  rppears  in 
tuitively  that  the  enemy's  lock  of  knowledge  can  only  do 
him  harm,  and  probably  this  can  be  proved,  but  this  quest 
has  not  been  investigated*    In  fact,  we  assume  only  one 
measure  ^  in  the  key  spaoe*    Similar  remarks  may  be  made 
regarding  measure  in  the  messrge  space  Ow. 


*See  J»  L.  Doob,  "Probability  as  Measure,"  Annals  of  Math 
Stat .\  v,  12,  194J.,  pp.*206-2U. 

A..  Kolmogoroff ,  "Grundbegrif fe  der  W^hrscheinlichkeits 
Rechnung,"  Ergebn'isse  der  Mr.thenetic,  v,2,  No*  3  (Berlin 
1933).  - 
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Actually  In  practical  situations,  only  extrec 
errors  in  P  priori  probabilities  of  keys  and  messages  cau 
much  error""in  the  important  parameters.    This  is  because 
the  exponential  behavior  of  the  number  of  messages,  etc, 
and  the  logarithmic  measures  employed. 

With  regard  to  the  application  of  the  m^ theme 
theory  of  probability  to  physical  situations  there  are  tv. 
main  theories  or  ways  of  setting  up  the  correspondence. 
The  frequency  theory-   .Probability  is  correlated  with  re 
frequency  of  an  event*   .This  Is  the  correspondence  used  t 
the  practicing  statistician,  in  principle  by  the  physic is 
etc.  (2)  The  degree  of  belief  approach.    .Probability  is  a 
subjective  phenomena  and  measures  one's  degree  of  belief 
the  occurrence  of  on  event*   .This  approach  is  seen  often 
the  work,  of  historians,  Judges,  and  in  everyday  life.  Al 
though  this  latter  approaoh  has  of ten  been  attacked  as  me 
less  we  cannot  agree  with  this  opinion.    In  the  first  pie 
the  intuitive  approach  can  be  given  a  rigorous  mothematic 
f«tuv4stion»   .  This  has  been  done  in  *  very  elegont  way  by 
B.  0.  Koopmen.*    Essentidly  one  need  only  assume  that  a 
be  capable  of  making  probability  judgments  (Event  A  is  m: 
less  probable  than  event  B  or  they  are  equiprobable)  and 
his  judgments  be  self  consistent  (e.g.  if  he  judges  A  mor 
probable  than  B  end  B  more  probable  than  C  he  should  jud£ 
more  probable  than  C).    One  can  even  establish  numerical 
by  the  use  of  a  "standard  gauge,"  for  example  a  roulette  v, 
and  thus  relnte  the  subjective  and  the  frequency  probabil 
In  the  second  place,  on  progmatlc  grounds  one  can  hardly 
the  subjective  applications ,  since  almost  all  of  our  ever 
decisions  are  based  on  this  sort  of  probability  judgment. 
Cryptographic  work  involves  both  types  of  applications, 
the  use  of  frequency  tables,  significance  tests  etc.,  the 
crypt-nalyct    is  following  the  frequency  approach.    In  th 
"intuitive"  methods  of  cryptanalysis    (probable  words  etc 
degree  of  belief  approach  is  more-  in  evidence*  » 

We  may  remark  that  e  single  operation  on  a 
language  which  is  reversible  forms  a  degenerate  type  of  e 
system  under  our  definition— a  system  with  only  one  key  r 
unit  probability-  Such  a  system  has  no  secrecy — the  cryi 
analyst  finds  the  message  by  epplying  the  inverse  of  this 
transformation,  the  only  one  in  the  system, -  to  the  interc 
cryptogram*    The  decipherer  and.  cryptanaiyst  in  this  case 


*B.  0.  Koopman,  "The  Axioms  and  Algebra  of  Intuitive 
Probability,"  Annals  of  Mathematics,  v. 41,  no. 2,  1940, 
p. 269.    "Intuitive  Probabilities  and  Sequences,"  v. 42, 
no.l,.  1941,  p. 169. 
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possess  the  ssme  inf ormation.  In  gonerr.l,  the  only  differ 
between  the  decipherers  knowledge  on3  the  enemy  cryptanal 
knowledge  is  that  the  decipherer  knows  the  pnrticul^r  key 
used,  while  the  cryptanalyst  only  knows  the  b  priori  pr->bc 
ities  of  the  various  keys  in  the  set.  The  process  of  deci 
ing  is  that  of  applying  the  inverse  of  the  particular  tror. 
formation  used  in  enciphering  to  the  cryptogram.  The  proc 
of  cryptenalysis  is  that  of  Attempting  to  determine  the  me 
(or  the  particular  key)  given  only  the  cryptogram  find  the 
a  priori  probabilities  of  various  keys  and  messages  * 

A  system  will  be  celled  fc^oaed"  if  any  possible 
cryptogram  can  be  deciphered  with  any  possible  key.  This 
that  the  inverse  transformations  T~l  are  ell  defined  for  e 
element  in  the  cryptogram  -spaoe.  1 

7/e  shPll  use  the  notation  |m|  for  the  "size"  of 
message  space:       ;  ../ 

X*  •  ImI-  *•£  P(M)  log  P(M) 

where  P(M)  is  the  probability  of  message  M  end  the  sum  is 
all  messages  of  just  N  letters.    Thus  \U\  is  a  function  of 
and  measures  the  amount  of  "choice"  in  the  selection  of  an 
letter  message.    F  or  large  N,   |M|  is  approximately  RN. 
Similarly  Ik]  is  the  size  of  the  key  space 

IkI  -  -  2  P(K)  log  P(K) 

the  sum  being  oyer  all  keys. 

9.    Representation  of  Systems 

^  A  secreoy  system  can  be  represented  in  various 
One  which  is  convenient  for  illustrative  purposes  is  a  lin 
diagram,  as  in. Figs.  7,  10,  11.    The  possible  messages  are 
represented  by  points  at  the  left  end  the  possible  cryptog: 
by  joints  at  the  right.    If;a  certain  key,  say  key  1,  tran 
forms  messnge  Mg  into  cryptogram  E .  then  M«  and  E.  are  con- 
nected by  a  line  ilabeled  lf  etc»    From  eacn  possible  messn 
there  must  be  exactly  one  line  emerging  for  epch  different 

t 

A-  second  representation  is  by  means  of  a  rectant 
array.    This  may  be  done  in  three  different  ways*    For  the 
closed  system  of.  Fig.  7,  the  three  arrays  are  as  follows: 
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M3 
Ma 


V 


K 


m\.  1 


El  E4  E2 

E3  El  E4 

E4  E3  E1 

E2  E2  E3 


^1 
M. 


M4 


E»    Eo.  E 
2      3  4 


.  K 


1 

2 
3 


1,2 


1 

2 
3 


2 
3 
1 


E  \ 

1 

2 

o 

El 

Ml 

% 

E2 

M4 

M4 

E3 

Mfi 

K4 

E4 

id3 

% 

transforms  %  Into  E-z  and  either  ?^£Vjt0  E§  by  key  3*  No 
From  the  third  E3  is^e^ipherel  hi  kL Vf^H  M4  ^to  Sa. 
arrays  and  the  l?ne  diagram  contain  !Lf  *?  gfVf  M3'    A1*  ofSthese 
any  one  the  others  can  be  derived,    equivaleGt  informs tion-from  , 

'     *  .  .  •  >  •  ^   •    _  .  •  *• . 

transform^^in^  describe  the  set  of  ^ 

bilities  of  various  ke?s  mS;  ai«  £pec}fy  tlle  system  the  proba- 
by  merely  listing  the  kevHftS       be  eivfn'    This  mW  ^  done 
Similarly  the  melsagl  SSbl  1?  not  Probabilities" 
the  probabilities  of  the  va^^^S •^.SSJ*1*  ^ 

the  set  oAZsfor^oL8 W\e?  18  t0  desc1^ 
forms  .on  the  message  for  an LhUl^      8t  °Per,2tions  one  per- 
grsm.    Similarly  one  d??iJes         f  X  6Lto    ybtr-in  the  crypto- 
various  keys  by  describing  how  Tklv  £         Probabilities  ?™  . 
of  the  enemy's  habits  of  kJv-  ilh««f  7  ^ ohosen,  or  what  we  know 
messages  are  Implicit  detL^0  The  Probabilities  tor 

knowledge  of  tha  e^mvL  ?       ined  by  stating  our  a  priori 
tion  (wflch  will  Since ^r^nh^^3'  th*  ^otToaTSfluB,  " 
and  any  special  inSiVwl  fi^Es 

.  ,«ajr  uave  regarding  the  cryptogram. 

10.  Notation 


M 
K 
E 

V 


The  following  notetioa  „m  generally  be  followed, 
the  encipher&d  message  or  cryDtourrm 

t%Zll&&\Tctnls  -S^SSW  probabilUlee,  .  ^ 
SbXi^W*  ProbaMlitles.  also  4 


3    »  the  cryptogram  space,  also  a  probability  space,  sine- 
the  probabilities  in  3L,  and       induce  probabilities 
CL/.for  each  cryptogram, 
th 

m,  ■  the  i      letter  of  the  message 
e^  *  the  i'tti  letter  of  the  cryptogram 

k^  «  the  itn  letter  of  the  key  when  it  can  be  so  describe 

Generally  P  stands  for  a  probability-  Conditional 
probabilities  are  indicated  with  subscripts;  Thus 

P(M.)  "  probability,  of  message  M 
P(E)  ■  probability  of  cryptogram  E 
P(K )  <■  probebility  of  key  K  .  • 

PM(E)  -  conditional  probability  of  ,E  if  message  M  is  chos 
Eg(M) :'.»  conditional  probability  of     if  cryptogram E  is 

intercepted,-  i*e#  the  a  posteriori  probability  of 
•    if  E  Is  observed*  "  "    O'    ,  *        ■  ■ 
Q    *  equivocation,  a  concept  to  be  defined  precisely  It 
which  measures  the  uncertainty  of  some  ~ knowledge  c 
fined  only  by  probabilities.    We  also  hr>ve  condit 
equivocations,  thus  Q^(K)  is  the  equivocation  of  ■ 
key  knowing  the  message. 
|k|    «  -  L  P(K)  log  P(K)  the  size  of  the  key  space 

\n\    •»  -  E  P(il)  log  P(M)  the  size  of  the  message  space 

[e|    •  -  E  P(E)  log  P(E)  the  size  of  the  cryptogram  space 

m  *  number  of  different  keys 
N  *  number  of  intercepted  letters 
RQ  »  mr-ximum  information  rate  for  a  language 

R  «  mean  rate 

JX  *  R 0  -  R  ■  redundancy  of  a  language 
T,  R,  S,  etc.  ■  secrecy  systems 

T*,  R»«  S,,  etc*  »  particular  transformations  of  these 

systems 


11  * 


Some  Examples  -of  Secrecy  Systems 


In  this  section. a  number  of' examples  of  ciphers  ^ 
be  given*  These  will' often  be  referred  to  in  the  remeinde: 
the  paper  for  illustrative  purposes*  " ;      * ' 

'.  " '   ■ 

1.    Simple  Substitution  Cipher. 

'■  \  -,. 

In  this  cipher  each  letter  of  the  message  is  repl 
by  a  fixed  substitute,  usually  Elso  a  letter.'    Thus  the  me: 

M  *.  m^  nig  m^  m4  » . . 


*  33  * 

be  cranes 


el  e2    3  4 


K*S^S««  x'u  ?he  IbstttuiV  AT  0  is  the  substitut 

for  B.,  etc*  "  •      v.  ,  •  ..  .  » 

2,    Transposition  {Fixed  Period  dV  •         -  V 

The  nessr.ee  is  divided  into  groups  of  length  d-.nd  a 

the  second  group,  etc\r!?*P*??£  first  d  integers-    Thus  fc 

that  mx  m2  m3  m4  ag  m6       nig       m10  oeco 

^  ^  m5  n4  m?  ^  *6  ^  mg  ...    4    Sequential  npplic* 

tion  of  two  or  mor,  transpositions  will  be  c.Ued  compound 
imposition.    If  the  periods  are  *1^V  1    Stow  d  i.< 

thrt  the  result  is  a  transposition      of      perioa  a, 
the  least  comon  multiple  of         dg,  d3,  V  v 

3.    Vigenere,  rnd.  Variations*  ■ 

In  this  cipher  the  key  consists  of  a  series  of  d 

A  «  0  to  Z  -  25).  Thus 

e^,  »       <*  fc^  i mod  26}  J 
where  k«  is  of  period  d  in  ithe  Index  U  \f 
For  example  with  the  key  G  A  H  we  obtain 


message  N  0  W  I  S  T  H  E  <*  ,  -  . 

repeated  key  G  A  H  G  AH  G  A  #  *  * 

cryptogram  _         T  0  D.  0  SANE-*** 

The  Vigenere  of  period  \}«  •^^"5"  xs'alvonced  a' 

»em^^ 

may  be  any  number  from  0  to  25.    The  so  oexxe*  o 
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V-ri^nt  Beaufort  r,re  similrr  to  the  Vigenere,  end  encipher  by 
the  equations 

el  *  ki  -        (mod  26) 


ei  *  mi  "  ki  ^mod  26  ^ 

respectively.    The  Be°,ufort  of  period  one  is  called  the 
reversed  Caeser  cipher. . 

The  application  of  two  or  more  Yigenfires  in  sequence 
will  be  called  the  oompound  Vigenere.  '  It  has  the  equation 

...  *  j  , 

ei  *  mi  +  kl  *  *i     ****  *  *i  (mod 

'      .  •    •  .  .    .  >  -  ■'«- .  ....         ,    ,  -  v.,,..  :-   •  • 

where         1^,  *..,       in  general  have  different  periods P 

•  •  •'      '    "'>'•■  •'    ■  ■■  '■     .  n&;    '/  •  • ■ 

The  period  of  their  sum         •  « 

<  .  *   *  * « 

ki  +  *i  +         *  si 

as  in  compound  transposition,  is  the  least  common  multiple  of 
the  individual  periods. 

4.  Vernam  System** 

When  the  Vigenere  is  used  with  an  unlimited  key, 
never  reperting,  we  h°ve  the  Vernam  system,  with 

ei  *  mi  *  ki  ^mod 

the  k,  being  chosen  at  random  and' independently  among  0,  1, 
25.  If  the  key  is  a  meaningful  text  we  have  the  "running 
key"  cipher. 

.  •  ' 

5.  Bazeries  Cylinder. 

.    ,>.'■-■-  ••  ■„  ;      •  'j  •        • » -v '  ,..«•■< 

In  this  mechanical  system  25  thick  disks  are  used,  - 
each  having  a  mixed  alphabet  stamped  around  the  edge.  These 
disks  can  be  arranged  in  any  order  on.a  spindle,'  and  the  par- 
ticular arrangement  used  constitutes  the  key.'    With  the  disks 
in  their  proper  order;  a  message, is- enciphered  by  turning  the 
disks  so  that  the  message  appears* on  a,. line -.parallel  to  the 
axis  of  the  spindle*    Any. other  line  of  letters  may  then  be 
chosen  for  the  cryptogram.   'To  decipher^  the  cryptogram  is 
arrenged  on  a  line  end-  the  decipherer  looks  for  another  line 
which  then  makes  sense.  — 

*G.  S.  Vernam,  "Cipher  Printing  Telegraph  Systems  for  Secret 
Wire'  and  Radio  Telegraphic  Communications.''  Journal  Ameri. 
Inst,  of  Elect.  Eng.,  Vj  ,'XLVy  p#,  !  109-115,  1926. 


6,    Digram,  Trigram,  rnd  N-gram  substitution. 

Rather  than  substitute  for  letters  one  cnn  substi 
for  digrams,  trigr^ms,  etc.  Genercl  digram  substitution  i 
quires  n  key  consisting  of  a  permutation  of  the  262  digrar 
It  can  be  represented  by  a  table  in  which  the  row  correspc 
to  the  first  letter  of  the  digram  and  the  column  to  the  se 
letter,  entries  in  the  table  being  the  substitutes  (usuall 
also  digrams)* 

7*    Interrupted  Key  Vigenere.  , 

The  Vigenere  and  its  variations  can  be  used  with 
interrupted  key* •  The  sequence  of  key  letters  is -started  e 
at  irregularly  spaced  points* 7  Thus^  if  the  entire  key  sec 
isXPGH*  TRS>  one  can  Interrupt  irregularly  to  get 

X  .P  OH  F  TI  H  X  P  Gfi  ?  lE'XPlPO  »  •  • 

The  points  of  interruption  can  be  determined  in  various  wt 
(1).  Whenever  a  certain  letter  occurs  in  the  clear »•  (£). 
Whenever  a  certain  letter  occurs  in  the  cryptogram.  (3.)  / 
interrupting  letter,  say  J,  can  be  reserved  as  a  signal  ar 
the  encipherer  Interrupts  the  key  at  his  discretion,  (4). 
signal  is  used  end  the  decipherer  loontes  the  interruption 
by  the  appearance  of  meaningless  text  in  the  decipherment, 
In  place  of  starting  the  key  again  at  ecoh.  interruption  or 
can  omit  letters  of  it  or  reverse  the  direction  of  progrer 
There  ere  many  variations  and  combinations  of  these  methoc 

8.    Single  Mixed  Alphabet  Vigenere. 

This  is  a  simple  substitution  followed  by  a 

Vigenere* 

e^  »  f (n^)  +  kj 

•  ■ 

The  "inverse"  of  this  system  is  a/Vigenere  followed  by  sir 
substitution' 

e .  ■»  g(m4  *  k«) 

.1,  i       i  . 

mi  r  e"1  (ei}  -  ki  , 

■ 


/ 


9-   Vigenere  with  Progressing  Key*  • 

The  period  of  >>  Vigenere  ean  be  expanded  by  ndding  n 
fixed  number  t  to  the  key  pt  e^.ch  pppefrance — thus  the  n^h  group 
is  enciphered  by  the  equ-.tion 

ei  *  mi  +  ki  +  nt 

Also  this  can  be  vnried  by  adding  t  and  s  alternately  to  the 
key,  etc. 

10.  Matrix  System** 

* 

One  method  of  n  gram  substitution  is  to  operate  on 
successive  n-grams  with  a  matrix  having  an  inverse*    The  letters 
are  assumed  numbered^  from  0  to  85,  making,  them  elements  of  an 
algebraic  ring.    From  the  n-gram  m,  ou  r»*  m   of  message,  the 
matrix  a^j  gives  an  n-gram  of  cryptogram        <  . 

'  n 

e,  •  Z    au  a,  i  »  1,  *t»,n 

1     j=l    1J  J 

The  matrix         is  the  key,  and  deciphering  is  performed  with 

the  inverse  matrix.    The  inverse  matrix  will  exist  if  and  only 
if  the  determinant  la^.  |  has  an  inverse  element  in  the  ring. 

11.  The  Playfair  Cipher. 

This  is  a  particular  typp  of  digram  substitution 
governed  by  a  mixed  25  letter  alphabet  written  in  a  5  x  5 
square.     (The  letter  J  is  often  dropped  in  cryptogrephic  work- 
it  is  very  infrequent,  and  when  it  occurs  can  be  replaced  by  I.) 
Suppose  the  iey  square  is  as  shown  below 

LZQCP 

A  0  N  0  U 

RDMIf  '? 

K  Y.S  T  S  ' 

X  B  T  E  W   -  "•'  —  -  ■ 

*  -  ' 

*See  L.  S»  Hill,  "Cryptography  in  an  Algebreic  Alphabet,1* 
American  Math.  Monthly,  v.  36,  No,.  6t  1,  1929,  pp. 306-312,* 
Also  "Concerning  Certain  Linear  Transformation  Apparatus  of  ^ 
Cryptography,"  v*  38,  No.  3,  1931,  pp. 135-154,. 
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The  substitute  for  a  digram  AC,  for  example,  is  the  pair  c 
letters  at  the  other  corners  of  the  rectangle  defined  by  A 
and  C,  i.e.  LO,  the  L  taken  first  since  it  is  above  A.  II 
digram  letters  nre  on  c.  horizontal  line  as  RI,  one  uses  th 
letters  to  their  right  DF;  RF  becomes  DR.  If  the  letters 
on  a  vertical  line,  the  letters  below  then  are  used.  Thus 
becomes  UW.  If  the  letters  are  the  same  nulls  nay  be  used 
separate  them  or  one  may  be  omitted,  etc. 

12.    Multiple  Mixed  Alphabet  Substitution. 

In  this  cipher  there  are  a  set  of  d  simple  subst 
tions  which  are  used  in  sequence.    If  the  period  d  is  four 

ml  <m2  *i  ffl4  m5  a6  ,,f 

.  ■•  ' 

becomes 


h[ml]  f2{m2}  f3(cl3)  f4(m4)  *11b5*  f2(m6} 


... 


13.    Autokey  Cipher. 

A  Vigenere  type  system  in  vihich  either  the  messr 
itself  or  the  resulting  cryptogram  is  used  for  the  "key"  i 
crlled  an  eutokey  cipher.  The  encipherment  is  started  wit 
a  "priming  key"  (which  is  the  entire  key  in  our  sense)  and 
continued  with  the  message  or  cryptogram  displaced  by  the 
length  of  the  prir4ng  key  as  indicated  below  with  the  prin 
key  COMET,    The  message  used  as  "key", 

MESSAGE  .   S  E  N  D  S  U  P      L  I  E  S  ... 

KEY    --  — -  COME  3.8  RiJD  S  UP 

CRYPTOGRAM  USZHLMTCOAYH 

The  Cryptogram  us"ed  as  "key"*  '  ; 

MESSAGE  SENDS  UP'P  LI  E  S  ♦*"#."' 

KEY  .  '  t  O  M  E  t  U  S  2  B  t  0  H  »». 

CRYPTOGRAM    u      U3ZHL0  H*e"S  TS 
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14.    Fractional  Ciphers* 

In  these,  each  letter  is  first  enciphered  into  two 
or  more  letters  or  numbers  and  these  symbols  are  somehow  mixed 
(e.g.  by  transposition).    The  result  may  then  be  retranslated 
into  the  original  alphabet.    Thus  using  a  mixed  25  letter 
alphabet  for  the  key  we  may  translate  letters  into  two  digit 
quinary  numbers  by  the  table 

0  12  3  4 
.     .  0  L  Z  Q,  C  P 

1  AG  NO  V 


2  R  D  M  I  F 

3  K  Y  H  V  S 

4  X  B  TEW  , 


.- 


Thus  B  becomes  41.    After  the  resulting  series  of  numbers  is 
transposed  in  some  way  they  are  taken  in  pairs  and  translated 
back  into  letters. 

15#  Codes. 

In' codes  words  (or  sometimes  syllables)  are  replaced 
by  substitute  letter  groups.  Sometimes  a  cipher  of  one  kind  or 
another  is  applied  to  the  result. 

* 

12 ^    Valuations  of  Secrecy  Systems 

There  are  a  number  of  different  criteria  that  should 
be  applied  in  estimating  the  value  of  a  proposed  secrecy  system 
The  more  important  of  these  are:  ' 

1.    Amount  of  Secrecy.  ' 

There  are  some  systems  that  are -perfect — the  'enemy 
ls-no  better  off  after  intercepting  any  amount  of  material  than 
before*  •  Other  systems,  although  giving  him  some  information, 
do  not  yield  a  unique  "solution"  to  intercepted  oryptograms*  , - 
Among  the  uniquely  solvable  systems,  there  are  wide  variations 
in  toe  amount  of  labor  required  to  effect  this  solution;  end  * 
the  amount , of  material  that  must,  be  intercepted  to.  make  the 
solution  unique,  - 
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2.  Size  of  Key.. 

The  key  must  be  transmitted  by  non-interceptible 
means  from  transmitting  to  receiving  ends.    Sometimes  it  must 
be  memorized.    It  is  desirable  then  to  have  the  key  as  small 
as  possible. 

3.  Complexity  of  Enciphering,  and  Deciphering  Operations. 

These  should,  of  course,  be  as  simple  as  possible. 
If  they  are  done  manually,  complexity  lends  to  loss  of  time, 
errors,  etc.  -  If  done  mechanically,,  complexity,  leads  to  large 
expensive  machines.  "    "  v 

4.  ;  Propagation  of  Errors. 


In  certain  types  of  secrecy  systems  an  error  of  one 
letter  in  enciphering  or  transmission  leads  to  a  large  amount 
of  error  , In  the  deciphered  text*    The  errors  are  spread  out  by 
the  deciphering  operation,  c fusing  the  loss  of  much  information 
and  frequent  need  for  repetition  of  the  cryptogram.    It  is 
naturally  desirable  to  minimize  this  error  expansion.. 

5.    Expansion  of  Message.. 

In  some  types  of  secrecy  systems  the  size  of  the 
message  is  increased  by  the  enciphering  process.    This  undesir- 
able effect  may  be  seen  in  systems  where  one  attempts  to  swamp 
out  message  statistics  by  the  eddition  of  many  nulls,  or  where 
multiple  substitutes  are  used.    It  also  occurs  in  many  "conceal- 
ment" types  of  systems  (which  are  not  usually  secrecy  systems 
in  the  sense  of  our  definition). 

15.    Equivalence  Clesses  In  the  Key  Space 

It  may  happen  that  in  a  ciphering  system  two  or  nnre 
different  keys,  say  keys  1,.  2,  and  7,  are  equivalent.  -By  this 
we  meen  that  for  every  M  ~  J 

■>  ■C^m"-i  -  .  ■  - ,      .  • 

,  '   ••'         •.  ;   -  >      ■  —  V  ' 

■  .  ,  '  '  '    .       ,    "  .  ■  Av  .  ■    ^   '  "■ 

These  keys  will  not  be  considered  as  distinct  but  will  be  thrown 
into  an  equivalence  class*.    It  is  >clear  that  the  cryptanalyst 
oan  never  determine  whioh  particular  one  of  these  was  used  but  " 
only  {at  test)  the  class..   The  probability  for  the  class  is  of 
course  the  sam  of  the  probabilities  of  the  different  keys  in    ' : 
the  class.- 


As  an  exemple,  in- the  Playfair  cipher  with  the  s; 
given  above,  the  following  are  equivalent  key  squares. 

GHXPY  X  C  I  2  T 

Z  F  E  C.I  JB'Dl.O 

LONRD  V  S  <}  T  A 

T  A  V  S  Q  t   W  B  MK  U 

K  U  W  B  M  IP  Y  GH 

We  can  think  of  the  possible  equivalence  classes  in  this  c 
as  arrangements  of  a  25  letter  alphabet  on  a  5  x  5  square 
on  an  oriented  torus.    The  number  of  different  .keys  is  not 
but  251/52  -  241 

•  . 

"  When  vie  say  that  two  seorecy  systems  are  the  sam 
mean  that  they  consist  of  the  same  set  of  transformations 
with  the  same  message  and  cryptogram  space  (range  and  dome 
and  the  same  probabilities  for  the  different  keys  (after  e 
identical  transformations  are  put  in  .the  same  equivalence 
class). 

14.    The  Algebra  of  Secrecy  Systems 

If  we  have  two  secrecy  systems  T  and  R  we  cen  of 
combine  them  in  various  ways  to  form  a  new  secrecy  system 
If  T  end  R  heve  the  same  domain  (message  space)  we  may  for 
kind  of  "weighted  sum," 

S  ■  p  *T  ♦  q 


where  p  *  q  -  1.    This  operation  consists  of  first  making 
preliminary  choice  with  probabilities  p  and  q  determining 
whioh  of  T  end  R  is  used.    This  cholse  is  part  of  the  key 
After  this  is  determined  T  or  R  is  used  ns  originally  defi 
The  total  key  of  S  must  specify  which  of  T  and  R  is  used  e 
which  key  of  T. (or  R)  is  used*  v 

■  , 
If  T  consists  of  the  transformations  T^.t  1 
with  probabilities  pv,  Pm  end  R  consists  o=f     R,f  ... 

Rv  with  probabilities  q,„  qk  then  S  «  p  T  *  q  R  cons 

of  the  transformations      Tp,  T^  "•— ,  T  ,  Rr,  Rfc  wit^ 

probabilities  pp,.,  ppg,  •       PPa,  qqx»  Sfagi        •  qqk 
respectively* 
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More  generally  we  c^n  form  the  sum  of  a  number 

systems. 

S  =  P1T+p2R+...  +  pmU     Sp1  -  1 

We  note  that  any  system  T  can  be  written  as  a  sum  of  fixed 
operations 

T  "  pl  Tl  +  p2  TS  + +  pm  Tm 

Tj  being  a  definite  enciphering  operation  of  T  correspond!: 
key  choice  i,  which  has  probability  pf« 

A  second  way  of  combining  two  secrecy  systems  is 
taking  the  "product",  shown  schematically  in  Fig.  8.  Suppr 
T  and  R  are  two  systems  and  the  domain  (language  space)  of 
can  be  identified  with  the  range  (cryptogram  space)  of  R. 
we  can  apply  first  R  to  our  language  and  then  T  to  the  resi 
of  this  enciphering  process.    This  gives  a  resultant  operat 
which  we  write  as  a  product  ' 

S  -  T  R 

The  key  for  S  consists  of  both  keys  of  T  and  R  which  are  as 
ohosen  aocording  to  their  original  probabilities  and  indepe 
ly.    Thus  if  the  m  keys  of  T  are  chosen  with  probabilities 

pl  p2  pm 
and  the  n  keys  of  K  have  probabilities 

pl  p2  pn 

then  S  has  mn  keys  (at  most;  there  may  and  often  will  be 
equivalence  classes)  with  probabilities-  p.  pl.    This  type  c 
product  encipherment  is  often  used;  for         J    example  one 
follows  a  substitution  by  a  transposition  or  a  transpositic 
by  a  Vigen£re,  or  applies  a  code  to  the  text  and  enoiphers 
jte*,  result  by  substitution,  transposition,  fractionation,  etc» 

k\  -  A  more  special  type  of  product  may  be  defined  in 

case  both  T  and  R  have  keys  of  the  3cme  size  which  may  be  f 

rw  in  one-to-one  correspondence  with  the  same  probabilities  fc 

corresponding  keys.    This  may  be  called  the  "inner  product, 
in  oontrast  with  the  above  which  may  be  more  completely  de- 
scribed as  an  "outer  product"  (these  names  are  derived  froir. 
a  rough  analogy  with  the  concepts  of  tensor  analysis).  In 
the  inner  product,  written 

'\  S  m  T  °R 


■ 
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r.nd  indicated  scheme tically  in  Fig.  9,  the  same  key  (or  corr- 
spending  keys)  are  used  for  both  T  end  R  chosen  with  the  com 
probability* 

For  exr-nple  one  nay  construct  e  transposition  cip: 
whose  key  is  a  permutation  of  the  alphabet,  each  permutation 
being  equally  likely,  and  apply  first  this  and  then  a  substi" 
tion  based  on  the  same  permutation.  One  also  sees  this  situ: 
tion  in  certain  geometrical  types  of  transposition  ciphers 
where  the  text  is  written  into  a  square  and  a  permutation  ba. 
on  a  key  word  applied  first  to  the  columns  and  then  the  r 
of  the  square, 

*  It  may  be  noted  that  multiplication  (either  kind) 

not  in  general  commutative,  (we  do  not  always  have  BS"SB 
although  In  special  cases  such  as  substitution  and  transposi* 
it  is.    Since  it  represents  an  operation  it  is  def initionall; 
associative.    That  is  R(ST)  -  (RS)  T  *  RST,.   Furthermore  we  ! 
the  laws  \  '        '   ,  ' 

p  (p»  T+  q'  R)  +  qS  *  p  p'  T  +  p  qT  R  +  q  S 
(weighted  associative  law  for  addition) 

T(pR+qS)«pTR+qTS 
(PR+qS)T-pRT+qST 
(right  and  left  hand  distributive  laws) 

and 

Pl  T  +  p2  T  +  ?3  R  -  (px  +  P2)  T  +  P3  R 

Finally  with  regard  to  this  algebraic  structure  of 
secrecy  operations,  we  note  that  every  closed  secrecy  system 
has  an  "inverse"  T1  obtained  by  Interchanging  the  E  end  M 
spaces,  with  key  probabilities  the  s*me,  and 

\T  R  S)»  -  S*  R»  T* 

(p  T  +  q  R)*  -  P  V  ♦  q  K*%  -  , 

'  ...<_ 

Note  that  T  T'  is  not  in  generel  the  -identity  (this  is  the 
reason  we  do  not  write  T**+)»  .  -< 

■■■  y.t:  I      .  .  -    .  .  - 

A  system  whose  M  and  E  spaces  can  be  identified, 
a  very  common  oase  as  when  letter  sequences  are  transformed 
into  letter  sequences,  may  be  termed  endomorphic*    An  endo- 
morphic  system  T  may  be  raised  to  a  power  Tn» 
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A  secrecy  system  T  whose  outer  product  with  itsel: 
is  equal  to  T,  i.e.  for  which 

T  T  ■  T 

will  be  called  idempotent.  For  example  simple  substitution 
transposition  of  period  p,  Vigenere  of  period  p  (all  with  e 
key  equally  likely)  are  idempotent. 

The  set  of  all  endomorphic  secrecy  systems  deflnec 
a  fixed  message  space  constitute  an  "algebraic  vrriety,"  th 
is,  a  kind  of  algebra,  using  the  operations  of  addition  and 
multiplication.    In  fact,  the  properties  of  addition  and  mu 
plication  which  we  have  discussed  lead  to  the  following  res 

Theorem  1:    The  set  of  endomorphic  oiphers  with  the  same 

message  space  and  the  two  combining  operations 
of  weighted  addition  and  ouster  multiplication 
from  a  linear  associative  algebra  with- a  unit 
element,  apart  from  the  fact  that  the 
coefficients  in  a  weighted  addition  must  be 
non-negative  and  sum  to"  unity* 

It  should  be  emphasized  that  these  combining  oper 
tions  of  addition  and  multiplication  apply  to  secrecy  syste: 
as  a  whole.    The  product  of  two  systems  TR  should  not  be  co 
fused  with  the  product  of  the  transformations  in  the  system 
TjR,,  which  also  appears  often  in  this  work.    The  former  T 
is  a**  secrecy  system,  i.e.  a  set  of  transformations  with  as- 
sociated probabilities;  the  latter  is  a  particular  trans- 
formation. •  Further  the  sum  of  two  systems  p  R  +  q  T  is  a 
system — the  sum  of  two  transformations  is  not  defined.  The 
systems  T  and  R  may  commute  without  the  individual  T,  and  R, 
commuting,  e.g.  if  R  is  a  Beaufort  system  of  a  given  perio 
all  keys  equally  likely, 

Ri  R 3  *  RJ  Ri' 

in  general,  but  of  course  RR  does  not  depend  on  its  order; 
actually  ^       •  - 

' -RR >  v  -vv-r         '  ■■  • 

the  Vigenere  of,  the  same  period  with  random  key*    On  the  oti 
hand,  if  the  individual  T.  and  E,  of  two  systems  T  and  R 
commute,  then  the  systems  commute**  "  \~    \  - 

.  i..  ..  •  >  ■ .    .  •  ••  - 

It  is  rather  surprising  to  find  an  algebraic  varir 
with  as  much  structure  as  a  linear  associative  algebra  in  w> 

■ 
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•the  elements  have  the  complexity  of  ciphers.     In  Hilbert  space 
theory,  for  example,  one  has  a  linear  associative  algebra, 
but  the  elements  of  the  algebra  are  transformations.    Here  the 
elements  are  sets  of  transformations  with  a  probability  space 
associated  ■  ith  the  transformation  parameter. 

These  combining  operations  give  us  ways  of  con- 
structing many  new  types  of  secrecy  systems  from  certain  ones, 
such  as  the  examples  given.    We  may  also  use  them  to  describe 
the  situation  facing  a  cryptanalyst  when  •attempting  to  solve  a 
oryptogram  of  unknown  type.    He  is,  in  fact,  solving  a  secrecy 
system  of. the  type 

T      Px  A  +  pg  B  * . .  .  .  +  Pr  S  +  p*  X  Z  p  m  1 

where  the  &f.B»>*t*i  s  are  known  types  of  ciphers,  with  the  p« 
their  a  priori  probabilities  in  this  situation,  and. pf  X 
corresponds  to  the  possibility  of  a  completely  new  unknown  type 
of  cipher* 

'    In  weighted  r.ddition  the  key  size  of  the  result  is 

given  by 


=  p  IK.J  +  q  |K2I  -  (p  log  p  +  q  log  q) 


=  p  Ik-J  +  q  Ik2|  ♦  |k3I 

i.e.  the  weighted  mean  of  the  two  keys  plus  the  size  of  the 
.  p,  q  key*    This  is  only  in  case  there  are  no  equivalences; 
if  there  are  it  will  always  be  less. 

For  the    outer  product  the  key  size  is 

Ik II  1^ I  ♦  |k2I 

■• 

with  -equality  only  when  there  are  no  equivalences.    In  the 
inner  product 

Ik! <  |kx!  -  Ik2I 

with  equality  under  the  same  condition. 
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15.    Pure  and  Mixed  Ciphers 

Certain  types  of  ciphers,  such  as  the  simple  sub 
stitution,  the  transposition  of  a  given  period,  the  Vigene 
of  o  given  period,  the  mixed  alphabet  Vigenere,  etc  (all 
with  each  key  equally  likely)  have  a  certain  homogeniety  v, 
respect  to  key*  Whatever  the  key,  the  enciphering,  deciph 
ing  and  decrypting  processes  are  essentially  the  same.  Thi 
may  be  contrasted  with  the  cipher 

PSMT 

where  S  is  a  simple'  substitution  and  T  a  transposition  of 
given  period.    In  this  case  the  entire  system  changes  for 
enciphering,  deciphering  and  decryptment,  depending  on  whe 
the  substitution  or  transposition  was  used* 

The  cause  of  the  homogeniety  %a  certain  ciphers 
stems  from  the  ^roup  property — we.  not! oe ' that  in  the  above 
amples  of  homogeneous  ciphers  the  product  of  any  two  trans 
formations  in  the  set  T,  T,  is  equal  to  a  third  transforme 
T,.  in  the  set,  while  T1^1J  does  not  equal  any  transformat 
iB  the  cipher  f 

p  S  +  q  T 

which  contains  only  substitutions  and  transpositions,  no 
products. 

We  might  define  a  "pure"  oipher,  then,  as  one  wfc 
T*  formed  a  group.  This,  however,  would  be  too  restricti-v 
since  it  requires  that  the  E  space  be  the  same  as  the  M  si 
i.e.  that  the  system  be  end amorphic.  The  fractional  trans 
position  is  as  homogeneous  as  the  ordinary  transposition  v- 
out  being  endomorphic.  The  proper  definition  is  the  folic 
A  cipher  T  is  pure  if  for  every  Tj,  Ty  Tk  there  is  a  Tg  s 
that 

Ti  V1  Tk  -  V  . 

and  every  key  is  equally  .likely.  '  Otherwise  the  cipher  Is 
The  systems  of  Fig.  7  are  mixed.    Fig-  10  is  pure  if  all  k 
are  equally  likely. 

r     «♦'•  -    r---  .  „i 

Theorem  2:    In  a  pure  cipher  the  operations  T.      T,  which 
transform  the  message  space  into  itselT  form 
group  whose  order  is  m,  the  number  of  differen 
keys. 


For 


Y1  \  V1  tj  " 1 

so  that  e*iCh  element  has  «n  inverse,  also  the  assoeiativ 
law  is  true  since  these  are  operations,  end  the  group 
property  follows  from 

using  our  assumption  that  T,-1  T,'  -  T .    •  T-  for  some  s. 

The  operation  T^-^T^  means,  of  course,  enciph 

the  message  with  key  j  and  then 'deciphering  with  key  i  w 
brings  us  back  to  the  message- spa'oe*  ,  If  T  is  endomorphi- 
i.e.  the  T,  themselves  transform  the  space  0M  into  itsel: 
is  the  case  with  most  ciphers,  where  both  the  message  sp 
and  the  cryptogram  space-  consist  of  sequehoes  of  letters 
and  the  T^'  are  a  group  and  equally  likely,  then  T  is  purt 
since 

■ 

Ti  Y    Tk  •  Ti  Tr  "  Ts  • 

Theorem  3:    The  outer  product  of  two  pure  c,iphers  which  c 
mute  is  pure. 

For  if  T  end  R  commute  ^  R^  -  R^  Tm  for  every  i,  j  with 
suitable  £,  m,  and 

.  .  ■  .  - 

The  commutation  condition  is  not  necessery,  however,  for 
product  to  be  a  pure  cipher*  ' 

A  system  with  only  one  key*  a  single  defini 

operation  T^,  is  pure,  since  the  only 'choice  of  Indices  is 


Tl  Tl"1  Tl  *  Tl* 


Thus  the  expansion  of  a  general  cipher  into  a  sum  of  such 
simple  transformations  also  '.exhibits  it  as  ft  sum  of  pure 
ciphers. 

An  examination  of  the  example  of  a  pure  cipher 
shown  in  Fig.  5  discloses  certain  properties.    The  message 
fall  into  certein  subsets  which  we  will  cell  residue  clas; 
and  the  possible  cryptograms  are  divided  into  correspond!: 
residue  classes.    There  is  at  least  one  line  from  er.ch  mes 
sage  in  a  class  to  each  cryptogram  in  the  corresponding  cl 
and  no  line  between  classes  which  do  not  correspond.  The 
number  of  messages  in  a  class  is  a  divisor  of  the  total 
number  of  keys.    The  number  of  lines  "in  parallel"  from  a 
message  M  to  a  cryptogram  in  the  corresponding  class  is  ec 
to  the  number  of  keys  divided  by  the  number  of  messages  ir 
the  class  containing  the  message  (or  cryptogram)*    It  is  s 
in  the  appendix  th?t  these  hold  in  generel  for  pure  cipher 
Summarized  in  a  more  formal  statement  we  neve  / 

Theorem  4:     In  a  pure  system  the  messages  can  be  divided  i 
a.  set  of  "residue  classes"  C.,  C2,  C„  and 

the  cryptograms  into  a  corresponding  set  of 
residue  classes  C'     C'     . ..,  C'  with  the  folic 
properties 

The  message  residue  classes  are  mutually 
exclusive  end  collectively  contain  all 
possible  messages..    Similarly  for  the 
cryptogrc-.ni  residue  classes. 

Enciphering  *ny  message  in  C,  with  any  ke 
produces  a  cryptogram  in  CI.  Decipherir. 
any  cryptogram  in  C!  with  any  key  leads 
to  a  message  in  C^t 

The  number  of  messages  in  C. ,  say  <p.  ,  is 
equal  to  the  number  of  cryptograms 
in  C£  and  is  a 'divisor  of  k  the  number 

of  keys. 

Each  mrssnge  in       can  be  enciphered  into 
erch  cryptogram  in  Ci  by  exactly.  JL 
different  keys.    Conversely  qp.  . 

for  decipherment.  4 


(1) 
(2) 
(3) 

(4) 
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The  importance  of  the  concept  of  a  pure  cipher 
the  reason  for  the  nane)  lies  in  the  fact  that  for  them  & 
keys  are  essentially  the  same.    Whatever  key  is  used  for 
&  particulsr  message,  the  a  posteriori  probabilities  of  a 
messages  are  identical*    To  see  this,  note  that  two  diffe 
keys  applied  to  the  same  message  lead  to  two  cryp-tcgrams 
the  same  residue  class,  say  Cj  »    The  two  cryptograms  ther 
fore  could  each  be  deciphered    by  — keys  into  each  mes.< 

9i 

in  C.  and  into  no  other  possible  messages.    All  keys  be in, 
equally  likely  the  a  posteriori  probabilities  of  various 
messages  are  thus 

pbim)  -  hp  a&ai  _mi 

E  P{M)  PM{E)  " 

where  M  is  in  C,,  E  is  in  CI  and  the  sum  is  over  all  mess- 
in  C, ..  If  E  and  M  are  not  In  corresponding  residue  classe 
Pg(Mr  -  0/    Similarly  it  can  be  shown  that  the  a  posterio: 

probabilities  of  the  different  keys  are  the  same  in  value 
these  values  ere  associated  with  different  keys  when  a  di? 
ent  key  is  used.    The  same  set  of  values  of  PE(K)  have  un< 
gone  a  permute t ion  among  the  keys.    Thus  we  haVe  the  resul 

.  Theorem  5:  In  a  pure  system  the  a  posteriori  probability 
of  various  messeges  P~(MJ  are  independent  of  t 
key  that  is  chosen*  The  a  posteriori  prob; 
bilities  of  the  keys  PE(K)  are  the  same  in  vai 
but  undergo  a  permutation  with  a  different  ke\ 
choice. 

Roughly  we  may  say  that  any  key  choice  leads  tc 
the  sf.me  cryptanalytic  problem  in  a  pure  cipher.  Since  tfc 
different  keys  all  result  in  cryptograms  in  the  same  resid 
class  this  means  that  all  cryptograms  in  the  same  residue 
class  nre  cryptanalytically  equivalent — they  lead  to  the  s 
a  posteriori  probabilities  of  messages  and,  epart  from  a 
permutr.tion,  the  same  probabilities  of  keys. 

As  an  example  of  this,  simple  substitution  wit: 
all  keys  equally  likely  is  e  pure  cipher-    The  residue  cle 
corresponding  to  a  giTen  cryptogram  E  is  the  set  of  all 
Cryptograms  that  may  be  obtained  from  E  by  ope'rstions  T  <  T 
In  this  case  T .  Tk~l  is  itself'  a  substitution  and  henoe  an. 
substitution  oil  E  gives  another  member  of  the  same  residue 
class..    Thus  if  the  cryptogram  is 
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E'ICPPGCf  d 

then 

E1»RDHHGDSN 

Eg»ABCCDBEF 

etc.  ore  in  the  same  residue  class.    It  is  obvious  in  this 
case,  that  these  cryptograms  are  essentially  equivalent. 
AIT  that  is  of  importance  in  a  simple  substitution  with 
random  key  is  the  pattern  of  letter  repetitions,  the  actur 
letters  being  dummy  variables  *  ,  Indeed  vie  might  dispense 
with  them  entirely  indicating  the  pattern  of  repetitions 
in  E  as  follows:*  - 


This  notation  describes  the  residue  class  but  eliminates  e 
information  as  to  the  specific  member  of  the  class*  Thus 
leaves  precisely  that  information  which  is  cryptanalytical 
pertinent.    This  is  related  to  one  method  of  attacking  sic 
substitution  ciphers — the  method  of  pattern  words. 

In  the  Caesar  type  cipher  only  the  first  difft 
ences  mod  26  of  the  cryptogram  are  significant.  Two  crypt 
grams  with  the  sane  Ae,  are  in  the  same  residue  class.  Or. 
breaks  this  cipher  by  the  simple  process  of  writing  down  t 
26  members  of  the  message  residue  class  and  picking  out  th 
one  which  makes  sense. 

The  Vigenere  of  period  d  with  rpndom  key  is  a'r. 
example  of  a  pure  cipher.    Here  the  message  residue  class 
consists  of  all  sequences  with  the  same  first  differences 
letters  separated  by  distance  d  as  the  cryptogram.  For 
d  m  3  the  residue  class  is  defined  by 

ml  "  m4  "  el  ~  e4 
m2      m5  "  e2  "  e5 

~  n6       e5  "  66  r 
m4  '  "7  "  64  "e7( 


| 
1 


^Suggested  by  a  notation  used  by  Quine  in  Symbolic  Logic* 
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where  E  -  e^,  e0,   ...  is  the  cryptogram  and  m^,  m^,  ...  is  any 
M  in  the  corresponding  residue  class. 

In  the  transposition  cipher  of  period  d  with  random 
key,  the  residue  class  consists  of  all  arrangements  of  the  e. 
in  which  no  e,  is  moved  out  of  its  block  of  length  d,  and  any 
two  e.  at  a        distance  d  remain  at  this  distance.     This  is  used 
in  brisking  these  ciphers  as  follows.    The  cryptogram  is  written 
in  successive  blocks  of  length  d,  one  under  another  as  belo-w 
(d  «=  5): 


el 

e2 

e3 

4 

e5 

e6 

e7 

e8 

e10 

ell 

e12 

• 

• 

• 

• 

• 

• 

* 

» 

The  columns  are  then  cut  apart  and ^rearranged  to  make  sense. 
When  the  columns  are  cut  apart,  the  only  information  remaining 
is  the  residue  class  of  the  cryptogram. 

Theorem  6:     If  T  is  pure  then  Tj_  T*      T  «  T  where  ' 
Ti  Tj  are  eny  tv,°  tronsform'' 'tions  of  T.  J  Conversely  if 

this  is  true  for  any       Tj  in  a  system  T  then  T  is  pure. 

The  first  part  of  this  theorem  is  obvious  from  the 

definition  of  a  pure  system.     To  prove  the  second  part  we  note 

first  that  if  T,  T."1  T  *  T  then  T,  T.-l  T    is  a  transforma- 
l     j  1     j  s 

tion  of  T.     It  remains  to  show  thpt  all  keys  are  equiprob^ble . 

We  have  T  -  E   P    T  and 
s 

s    *s     i     j        s      s   *s  s 

the  term  in  the   left  hand  sum  with  s  •  j  yields 
The  only  term  in  Tj  on  the  right  is  Since  all  co- 

efficients rrc  non  negative  it  follows  that 

x 

The  same  argument  holds  with  i  and  $  interchanged  and 
consequently 

pj  c  Pl 

and  T  is  pure.    Thus  the  condition  th^t  T,  T.-1  T  -  T  might 
be  used  ~s  an  -  lti.rn- tive  definition  of  a      J  pure  system. 
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The  property  of  purity  in  e  system  is  connected  vtit.v 
idempotence.     Thus  consider  the  system  S  ■  T  T'  where  T  is 
pure.    We  have 

Ti  Tj"1  Ts  V1  '  Ti  V1  Tr  V1  "  Ti  V1 

so  th"t  the  transformations  of  S  are  the  same  ~s  those  of  S, 
■and  since  both  S  and  S    are  pure  we  hrve 

S  -  S2 

Theorem  7:    If  T  is  pure  S  »  T  I'  is  pure  and  S2  *  S. 

An  endomorphic  system  T  which  satisfies  the  conditi' 
Ti  Tj  *  Ts  ^but  not  necessrrily  with  all  key  probabilities 
equal)  can  be  shown  to  approach  a  pure  cipher  on  raising  to  a 
high  power,  namely  the  one  with  the  same  trensf ormr-tions ,  but 
with  all  probabilities  equalized..    In  fact  the  probabilities 
for  Tn+1  are  derived  from  those  for  T^  by  a  Markoff  process, 
of  a  special  type  due  to  the.  group  property*    This  special 
type  always  approaches  the  limit  of  equalized  probabilities. 
This  seme  argument  applies  more  generally.'   We  have 

Theorem  8:    Let  T  be  any  endomorphic  cipher.     If  T11  approaches 
any  limit  at  ^11,  which  will  necessarily  occur  if 
all  the  transformations  of  Tn  lie  in  a  finite  set 
(no  matter  how  large  n)  and  the  transf arffln tions  of 
T  include  the  identity  then  this  limit  will  be  r 
pure  cipher. 

As  m  example  consider  the  cipher 

R  =  p  T  +  q  S 

where  T  is  transposition  with  random  key  and  S  substitution 
with  random  key.    We  have 

S2  =  S 


T 


ST    ■  T  S 

- 


and  hence  any  product  of  T*  s  and  S?s  suoh  asTST-TTSS 
reduces  to  S  T.  Thus 

Rn  -  pn  T  +  qn  S  +  (1  -  pn  1  qD)  S  T 
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Ls  n      10  the  first  two  terms  approach  zero  find 

Lin    Rn  »  S  T 

n  -*•  xi 

The  concepts  of  pure  ^nd  mixed  lnngu-.gts  nnd.  pu 
and  mixed  ciphers  have  an  application  in  practical  cryptana 
ysis,  if  we  interpret  them  somewhat  loosely.  When  a  crypt-1 
grapher  starts  work  on  a  cryptogram,  his  first  job  is  to  de 
termine  the  original  language.  Approximately  then  he  is  de 
termining  the  pure  component  of  the  general  language  space 

L  >  px  Lx  +  p2  Lz  +  ...  ♦  pn  Ln 

where        say  is  English,  L£  German,  etc.    Of  course  these  e 

not  pure  but  the  different  components  of  them  are  fairly  cl 
together  in  statistical  structure. 

The  second  thing  a  cryptographer  d~>es  is  to  de 
termine  the  "type"  of  cipher  that  was  used — usually  this  is 
about  the  same  as  finding  the  pure  component  in  the  general 
cipher  system 

R  •  Px  S  +  p2  T  +  p3  Y  +  ... 

where  3  say  is  simple  substitution,  T  is  transposition,  etc 
A  Vigenere  V  of  unknown  period  is  not  a  pure  cipher  but  the 
decomposition 

V  *  Pi  Vl  +  P2V2  +  *3  V3  +  — 

where  V,  is  of  period  i,  is  into  puro  components  (if  all  ke 
are  equally  likely  for  any  period).     In  solving  e  Vigenere 
the  first  problem  is  to  determine  the  period.    The  same  is 
true  in  transposition. 

The  reason  for  this  initial  isolation  of  pure 
«of  neerly  pure  language  and  cipher  is  that  only  then  or.n  a 
simple  meaningful  stntistical  analysis  be  carried  out. 

— 

16.    Involutory  Systems 

If  every  trsnsf orrar: tioh  in  n  systen  T  is  its  y. 
inverse,  i.e.  If 

Ti  Ti  -  1 


for  every  i,  the  system  will  be  called  involutory.  Such 
systems  are  important  prrcticrlly  since  the  enciphering  r 
deciphering  operations  -re  then  identical.  This  l«vds  t* 
sinplifiod  instructions  to  cryptographic  clerks  in  manual 
oper^ti^n,  or  in  mechanical  cases  the  sane  machine  with  t 
sane  key  setting  nay  be  usee"  for  bath  ~perctions. 

Examples:     In  simple  substitution  we  nay  limit  our  trans- 
formations to  those  in  which  when  letter  9  is 
the  substitute  for  <p,  9  is  the  substitute  for 
.toother  example  is  the  Beaufort  cipher- 

If  T  is  involutory,  so  is  the  system  whose  ope 
tions  are  :^-.;>r : 

■  -  .  *  ' .     •"  ■  .*• 1 

SS  Ti  si 

\  -  ,* 

since  ■  ; . 

17.    Similar  rnf  Weekly  Similar  Systems 

Two  secrecy  systems  R  and  S  will  be  s-^id  to  b< 
similar  if  there  exists  '  transf  orn- tion  /.  having  en.  invc 
A- J-  such  th^t 

r 

R  ■  A  S 

This  means  thrt  enciphering  with  R  is  the  same  ps  enciphe 
with  S  '  n.Q  then  0  per-  ting  on  the  result  with  the  transf  or 
tion  A.  If  wo  write  Rw  S  to  mean  R  is  similar  to  S  then 
is  clear  thrt  R»S  implies  S^R,  Also  R«  S  pnd  S»  T  impl 
R~T  and  finally  R~R.  These  are  sun-prized  in  mathenati 
terminology  by  spying  that  similarity  is  an  equivalence 
relation.  *  *  '/  * 

The  cryptographic  significance  of  similarity  i. 
if  R~S  then  R  and  S  are  equivalent  from  the  cryptanaly 
point  of  view.  Indeed  if  a  cryptanalyst  intercepts  a  cry 
gram  in  systemNS  he  can  transform  it  to  one  in  system  R  b; 
merely  applying  the  transformation  A  to  it#  /.  cryptogram 
system  R  is  transformed  to  one  in  S  by  applying vArlf  If  : 
and  S  ar6  applied  to  the  same  language  or  message  space, 
there  is  f  one-to-one  correspondence  between  the  rc-sultin 
cryptograms.  Corresponding. cryptograms  give  the  same  dis 
tribution  of  r  posteriori  probabilities  for  all  messages. 


If  ~ne  hrs  r  art|p3  of  broking  the  system  R  the: 
any  system  S  similar  to  R  en  be  broken  by  reducing  to  R 
through  application  if  the  -perrti^n  A.'    This  is  r  device 
thct  is  frequently  used  in  pr^ctic~l  cryptrn" lysis . 

Examples:     As  r  trivial  cx^mjle,  simple  substitution  v.herc 
the  substitutes  ^re  n^t  letters  but  ^rbitr^ry 
symbols  is  similar  t?  simple  substitution  using 
letter  substitutes.     A  second  exrmple  is  the 
Cresar  rnd  the  reversed  C^es^r  type  ciphers. 
The  letter  is  sometimes  broken  by  first  trans- 
forming into  a  Cresar  type.     The  V-igenere, 
Beaufort  rn?  Variant  Beaufort  are  p11  similar, 
•when  the  key  is  random.    The  "autokey"  cipher 
primed  with  the  key  K,  Kg  ...  K,  is  similar  to  • 
Vigenere  type  with  the  key  .'alternately  added  an' 
subtracted  Lod  86»    The  %tf  nsformrtion  A.  in  this 
case  is  th^t  of  "deciphering"  the.  autokey  with 

.  a  series  of  d  A*s  for  the  priming  key.-.  - 

*  '•-•.'■».    .■■>:.  .v.... 

Tv,-  systems  R  fn?  S  are  weakly  similar  if  there 
exist  two  transformations  A  an<*  B  having  inverse  A'l  end 
B-l  with 


R  -  A  S  B 

This  me^ns  ttrt  system  R  is  the  same  ~s  applying  first  B 
t^  the  language,  then  S,  mc1  finally  A.     This  rcl^tim  is 
rlso  nn  equivalence  relation. 

Finding  a  method  of  solution  f-^r  system  R  with 
lrngunge  L  is  equivalent  t^  finding  a  solution  for  S  with 
language  B  L.  ■ 

We  may  note  that  if  R  is  pure  an'  S  is  weekly 
similar  t'  R  then  S  is  pure.    This  follows  from 

R.i  Rj-1  Rk  -  Rt 

■  A  Si  B 
Kfl  «  B--1  Sj1  A"1 

\  -  A  sk  B  v/ 

where  we  assume  corresponding  transformations  in  R  on"  S 
t-i  h~ve  the  srme  subscripts.  Hence 
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-i 


-1 


R.  R  -  *  R.    -  A  S,  S.      S.    B  "  R 


i  °j 


.r1  r^  b"1 


3j 


anc  S  is  therefore  pure* 


*  -  t 


t  •.  . 


PART  II 


Theoretical  Secrecy 


Introduction 

We  now  consider  problems  connected  with  the  "theorecti- 
cal  secrecy"  of  a  system.    How  immune  is  a  system  to  cryptanaly- 
sis  when  the  eryptanalyst  has  unlimited  time  and  manpower  avail- 
able for  the  analysis  of  cryptograms?    Does  a  cryptogram  have  a 
unique  solution  (even  though  it  may  require  an  impractical  amount 
of  work  to  find  It)  and  if  not  how  many  reasonable  solutions  does 
it  have?    How  much  text  in  a  given  system  must  be  intercepted  be- 
fore the  solution  becomes  unique?    Are  there  systems  which  never 
become-  unique  in  solution  no  matter  how  much-  enciphered  text  is 
Intercepted?    Are  there  systems  for  which  no  Information  whatever 
is  given  to  the  enemy  no  matter  how  much  text  is  intercepted? 

18    Perfect  Secrecy 

Let  us  suppose  the  possible  messages  are  finite  in 
number  Mi..*  Mn  and  have  a  priori  probabilities  P{Mi),..., 

P(Mn),  and  that  these  are  enciphered  into  the  possible  crypto- 
grams Ei  ,..Em  by 

E  -  Ti  M  . 

The  eryptanalyst  intercepts  a  particular  E  and  can 
then  calculate  the  a  posteriori  probabilities  for  the  various 
messages,  Pe(M) •    IT  is  natural  to  define  perfect  secrecy  by 

the  oondition  that  for  all  E,  the  a_  posteriori  probabilities  are 
equal  to  the  a  priori  probabilities  independently  of  the  .values 
of  these,    In~~tnis  case,  intercepting  the  message  has  given  the 
eryptanalyst  no  information**    Any  action  of  his  whioh  depends 
on  the  Information  contained  in  the  cryptogram  cannot  be  altered, 
for  all  of  his  probabilities  as  to  what  the  cryptogram  contains 
remain  unchanged*-  f  On  the  other  hand,  if  the  condition  Is  not 
satisfied  there  will  exist  situations'  in  which  the  enemy  has  cer- 
tain a_  priori  probabilities,  and  certain  key  snd  messages  are 
chosen  where  the  enemy^  probabilities  do  .change*    This  in  turn 
may  effect  his  actions  and  thus  perfect  secrecy -has  not  been  .  .  , 

—  «•.'  *»        ^  «•        «•         —        «►        «•        —        -*        a»  _   ■»         f         •»         —         a»  .     a*  •» 

*A  purist  might  object  that  the  enemy  has  obtained  a  bit  of  infor- 
mation in  that  he  knows  a  messsge  was  sent.    This  may  be  answered 
bykJhaving  among  the  messages  a  "blank"  corresponding  to  "no  mes- 
sage tfl    If  no  message  is  originated  the  blank  is  enciphered  and 
sent  as  a  cryptogram,,    Then  even  this  modicum  of  remaining  infor- 
mation is  eliminated, 


obtained.  Hence  the  definition  given  is  necessarily  required  by 
our  ideas  of  what  perfect  secrecy  should  mean. 

A  necessary  and  sufficient  condition  for  perfect  sec- 
recy can  be  found  as  follows.-    We  have  by  Bayes'  theorem 

t>  P(M)  ^(E) 
P-r  M    -   ■  

*  P(E) 

>  ■ 

and  this  must  equal  P(M)  for  perfect  secrecy,    Hence  either 
P(M)  *  0,  a  solution  that  must  be  excluded  since  we  demand  the 
equality  independent  of  the  values  of  P(M) ,  or   ;  ; 

-  '  )    ;        -,p(e)  .  ■ 

for  every  M  and  E»    Conversely  if  ^(E)  -  P(E)  then 
and  we  have  perfect  secrecy*    Thus  we  have  the  result: 


■  . 


Theorem-  9;    A  necessary  and  sufficient  condition  for 
perfect  secrecy  is  that 

- 

PM(E)  -  P(E) 

for' all  M  and  E.    That  is  Pjj(E)  must  be 
independent  of  K, 

The  probability  of  all  keys  that  transform  M«  into  a  given  crypto- 
gram E  is  equal  to  that  of  all  keys  transforming  if*  into  the 
same  E. 

Now  there  must  be  as  many  E's  as  there  are  MTs,  since 
fixing  i,  Tj  gives  a  one-to-one  correspondence  between  all  the 
MTs  and  some  of  the  E»s  .    For  perfect  secrecy  Pvr(E)  «  P(E)  ^  0 
for  any  of  these  E»s  and  any  M. ■  Hence  there  is  at  least -one  key 
transforming  any  M  into  any  of  these  E*e,    But  all  the  keys  from 
a  fixed  M:to  different  E's  must  be  different,  and  therefore  the' 
number  of  different  keys, is  at  least  as  great  as  the  number  of 
M»s*    It  is' possible  to  obtain' perfect,  secrecy  with  no  more,  »s 
one  shows  by  the  following  example*  .  I,et  the       be  numbered  1  to 
n  and.  the  E^  the  same >  and  using  n  keys  let 
_  - ^  ■*  >:?:**,:■  <■     *  *f 'f'*t'%«..   .:  .  ■     .   •'        •'    rj**?*  '  '  - 

where  s  ■  i  +>j  (Mod  nj  .  •  In  this^case  we  see  that  P~(M)  »  —  »  P<E) 
and  we  have  perfect  secrecy.'  An  example  is  shown 
with  n  «  5. • 
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These  perfect  systems  in  which  the  number  of  crypt 
grams,  the  number  of  messages r  and  the  number  of  keys  are  al 
equal  are  characterized  by  the  properties  that  (1)  each  M  is 
connected  to  each  E  by  exactly  one  line,  (2)  all  keys  are  eq 
likely.  Thus  the  three  matrix  representations  of  the  system 
"latin  squares". 

We  have  then  concealed  completely  an  amount  of  inf 
tion  at  most  log  n  with  a  size  of  key  log  n.  This  is  the  fi 
example  of  a  general  principle  which  we  will  often  see,  that 
there  is  a  limit  to  what  can  obtain  with  a  given  key  size— t 
amount  of  uncertainty  we  can  introduce  into  the  solution  of 
cryptogram  cannot  be  greater  than  the  key  size*  Here  we  hav 
concealed  all  the  information  but  the  ke*y  size  is  as  large  a 
message  space*  . 

We  now  consider  the  case  where  lM|  is  infinite;  in 
suppose  the  message  generated  as  an  unending  sequence  of  let 
by  a  Markoff  process*  The  maximum  rate  of  this  source  is  Rc 
It  is  clear  from  our  results  above  that  no  finite  key  will  g 
perfect  secrecy.  We  suppose  then  that  the  key  source  genere 
key  also  in  the  same  manner,  i.e.  as  an  infinite  sequence  or 
bols  with  a  mean  rate  RK.  Suppose  that  only  a  certain  lengt 
key  Ljc  is  needed"  to  encipher  and  decipher  a  length       of  mes 

Theorem  10:    For  perfect  secrecy  (when  the  a  priori  proba- 
bilities of  various  messages  can  be  anything) , 
for  large  L 

Ro  LM  <  % 

and  the  rate  (RR  *  e)  is  asymptotically 
sufficient. 

This  may  be  provSd  by  the  same  method  (essentially 
the  finite  case.    This  case  is  realized  by  the  Vernam  systet 

These  results  have  been  deduced  on  the  basis  of  un 
or  arbitrary  a. priori  probabilities  for  the  messages*  The  k 
required  for  perfect  secrecy  depends  then  on  the  total  numbe 
possible  me s sages j  6?  on  the  maximum  rate  Bo  °f  the' message 

source.  *    -  •'. 

"  ~* '  -  one  would  suspect  that  if  the  message  space  has  fi 

known  statistics;  so  that  it  has  a  definite  mean  rate  R  of 
generating  information,  th<3n  the  amount  of  key  needed  could 
reduced  in  an  average  sense  in  just  this  ratio  JL»  end  this 

Ro 

indeed  true.  In  fact  the  message  can  be  passed  through  a  ti 
ducer  which  transforms  it  into  a  normal  form  and  reduces  the 
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expected  length  in  just  this  ratio,  and  then  a  Vernem  syst- 
may  be  applied  to  the  result.  Evidently  the  amount  of  key 
per  letter  of  message  is  statistically  reduced,  by  a  factor 

R 

—  and  in  this  case  tho  key  source  and  information  source 
H0 

just  matched--an  alternative  of  key  conceals  an  alternativ 
information.    It  is  easily  seen  also,  by  the  methods  used  : 
"Information*  paper  that  this  is  the  best  that  can  be  done. 

K  Theorem  11;    'Perfect  secrecy  (omitting  the  condition  of 
independence  of  a_  priori  probabilities)  for 
.    a  source  with  fixed  statistics  and  a,  rate 
R  of  generating  Information  can  be' 'achieved 
with  a  key  source  which  generates  at  the 

rate  (R  +  e)       where  W  and  Lv  are  message 

„  •  -  _  «•  ** 

LK 

and  key  lengths^ which  correspond.  ;A  rate 
less  than  R  iM.    is  insufficient.: 

%  '  - 

Perfect  secrecy  systems  have  a  place  in  the  prac- 
picture — they  may  be  used  either  where  the  greatest  import 
is  attached  to  complete  secrecy — e.g.  correspondence  betwe. 
the  highest  levels  of  command,  or  in  cases  where  the  numbe: 
possible  messages  is  small.  Thus,  to  take  an  extreme  exam; 
if  only  two  messages  "yes"  or  "non  were  anticipated  a  perft 
•system  would  be  in  order,  with  perhaps  the  transformation  - 

K 


M 

A 

B 

yes 

-  0 

1 

no 

1 

0 

The  disadvantage  of  perfect  systems  for  large  co: 
pondence  systems  is,'  of  course,  the  equivalent  amount  of  ke 
that  must  be  sent.  In  succeeding  sections  we  consider  what 
be  achieved  with  smaller  key  size,  in  particular  with  fini- 
keys, 

19.  Equivocation 

Let  us  suppose  that's  simple  substitution' cipher 
been  used  on  English  text  and  that  we  Intercept  a  certain  t 
N  letters,  of  the  enciphered  text.    For  N  fairly  large,  mo: 
than  say  50  letters,  there  is  nearly  always  a  unique  solut: 
the  cipher;  i.e.  a  single  good  English  sequence  which  tram 
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into  the  intercepted  materiel  by  a  simple  substitution.  W: 
smaller  N,  however,  the  chance  of  more  than  one  solution  is 
greater;  with  N  *  15  there  will  generally  be  quite  a  numbe: 
possible  fragments  of  text  that  would  fit,  while  with  N  =  E 
good  frecteon  (of  the  order  of  1/8)  of  all  reasonable  Engl: 
sequences  of  that  length  are  possible,  since  there  is  seldc 
more  than  one  repeated  letter  in  the  8.  With  N  «*  1  any  let 
is  clearly  possible  and  has  the  same  a  posteriori  probabili 
as  Its  a  priori  probability,.  For  one^letter  the  system  is 
feet,  ~ 

This  happens  generally  with  solvable  ciphers.  Be 
any  material  is  intercepted  we  can  imagine  the  a^  priori  pre 
bill ties  attached  to  the  various  possible  messages,  and  a Is 
to  the  various  keys.  As  material  Ik  Intercepted,  the  crypt 
lyst  calculates  the  a  posteriori  probabilities;  and  as  N  ir 
the  probabilities  *>f*""certa  in  messages  •  increase  *  and  of  most 
decrease,  until  finally  only  one  is  left ^  which  has  a  probe 
nearly  one,  while  the  total  probability  of  all  others  is  ne 
zero,  -  :  r. 

This  calculation  can  ectually  be  carried  out  for 
simple  systems.    Table  1  shows  the  a  .posteriori  probabiliti 
for  a  Caesar  type  cipher  applied  to  English  text,  with, the 
chosen  at  random  from  the  26  possibilities.    To  enable  the 
of  standard  letter  digram    and  trigram  frequency  tables  the 
has  been  started  at  a  random  point  (by  opening  e  book  and  p 
a  pencil  down  at  random  on  the  page).    The  messege  selectee 
this  way  begins  "creases  to  •  ,  ,"  starting  inside  the  wore 
creases.    If  the  message  were  to  start  with  the  beginning  c 
sentence  a  different  set  of  probabilities  must  be  used,  cor 
ponding  to  the  frequencies  of  letters,  digram     ,  etc,,  at  t 
beginning  of  sentences,  ./.„.■ 

The  Caesar  with  random  key  is  a  pure  cipher  and  t 
particular  key  chosen  does  not  affect  the  a  posteriori  prot 
bilitles;    To  determine  these  we  need  mereTy  list  the  possi 
decipherments  by  all  keys  and  calculate  their  a  priori  prob 
bilitles*    The  a  posteriori  probabilities  are  Ehese  divided 
their  sum;    These  possible  decipherments  are  found  by  the 
standard  process  of  "running  down  the  alphabet"  from  the  me 
and  are  listed  at  the  left*    These  form  the  residue  olass  f 
the  message.    For  one  intercepted  letter  the  a  posteriori  p 
bilitles  ere  equal  to  the  a_  priori  probabilltres  for  letter, 
are  shown  in  the'  column-  headed  Nf  s  1,    For  two  intercepted 
letters  the  probabilities  are  those  for  digram     adjusted  t 
sum  to  unity  and  these  are  shown  in  the  column  N  *  E. 
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Table  1 

A  Posteriori  Probabilities  for  a  Caesar  Type  Cryptogr 


Decipherments 

N  =  1 

N  -  2 

N  -  3 

N  -  4 

CREAS 

•  032 

.015 

•111 

.55 

DSJBT 

,  .036 

.068 

ETGCU 

,123 

.170 

/  • 

F  U  H  D  V 

,  .023 

,023 

G  V  I  E  W 

.  .016 

«■ 

H  W  J  F  X 

,051 

-  .015, 

• 

I  X  K  G  Y 

,072 

t-i 

JYLHZ  ' 

.001 

K  Z  M  I  A 

.  .005 

L  A  N  J  B 

.  .040 

.  ,072 

.  .250 

.01 

MBOKC 

,  .020 

.019 

.  .022 

.  *.oi 

N  C  P  L  D 

.  ,072 

4  ,066 

0  D  %  M  E 

.  .079 

V  .034 

P  E  R  N  F 

,  ,,023 

,  .085 

.  #438 

a  n 

.  -#43 

Q  F  S  0  G 

.  „002 

RGTPH 

.  .060 

.013 

SHUQI 

•  .066 

.064 

.  .005 

T  I  V  R  J 

.096 

.272 

.166 

U  J  W  S  K 

.  .030 

V  K  X  T  L 

.  .009 

W  L  Y  U  M 

.  .020 

.008 

.005 

X  M  Z  V  N 

.002 

Y.N  A  WO 

.019 

.006 

Z  0  B  X  P 

.001 

A  P  C  Y  Q 

.080 

.  .066 

B  Q  D  Z  R 

.016 

Q,  (digits) 

-1.248 

#999 

.  .602 

.340 

Trigram  frequencies  have  also  been  tabulated  and  .these  are 
in  column  N  *.3.    For  four  and  five  letter  sequences  probe 
,  ties  were  obtained  by  multiplication  from  trigram  t re quenc 
since  approximately  "  ,\  '..  Vv^w.-'-- 


•v-  • 


p{ijki)  --p(tjk)  PJk(^) 

■  **-  ■  ->        .        --.  ■ 

t 
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Note  that  at  three  letters  the  field  has  narrowe 
to  four  messages  of  fairly  high  probability,  the  others bei 
snail  in  comparison.    At  four  there  are  two  possibilities 
five  just  one,  the  correct  decipherment. 

In  principle  this  could  be  carried  out  with  any 
but  unless  the  key  is  very  small  the  number  °f  jg""^ 
so  large  that  the  work  involved  prohibits  the  actual  caicu 


This  set  of  a  posteriori  probabilities  describes 
the  cryptanelyst's  knowledge  of  the  message  and  key  g re due 
becomesPmore  precise  as  enciphered  material  is  obtained 
description,  however;  is  much  too  involved  and  difficult  t 
obtain  for  our  purposes.    What  is  desired  is  a  simplified 
caption  of  this  approach  to  uniqueness  of  the  possible  sc 

We  will  first  define  a -quantity  Q  called  the  "ec 
vocation"  which  measures  in  an  average  way  ^.^J*8"*; 
the  solution,  or  How  far  it  is  from  unicity.  Suppose  tha; 
celtl in  cryptogram  E  ,of  N  letters  has  been  intercepted.  . 
c?yptaSa^st  III  in  principle  calculate  the  a  posteriori  , 
Mlities  by  the  use  of  Bayes'  theorem..-  Thus 


P^M)  «  P(M)  PM(E)/P(E) 


Similarly  the  probabilities  for  various  keys,  after  E  has 
intercepted  are  given  by 


P2(K)  -  PlK)  Pk(E)/?(E) 


The  equivocation  of  the  message  should  measure 
way  how -spread  out  these  probabilities  PE(M)  are;  how  far 
are  from  being  concentrated  at  one  message.  In  Xio*  with 
General  principles  of  measuring  such  dispersion,  as  in  th 
Srhnioe  uncertainty,  and  generating  Information,  we  de 
He  Equivocation  or  tU  messfge  when  E  has  been  intercept 

...  ■  ■■  ....... 

•v^-v^-.  ,  ■         ^(M)  m      j.  pg(M)  log'  Pe(M) 


M 


the  summation  being  over  ell  P05*1*1^*3  !f  ^ven^1*1"1 
equivocation  in  key  when  E  in  intercepted  Is  given  *y 


q(K)  -  -  T  PE(K)  log  Pe(K) 
K 


The  same  general  arguments  used  to  justify  our  me 
of  information  rate  may  be  used  here,  to  justify  the  equivc 
measure.  We  note  that  equivocation  zero  requires  that  one 
sage  (or  key)  have  probability  one,  all  others  zero.  Equi\ 
is  measured  in  the  same  units  as  information,  i.e.  alterna' 
digits,  etc.,  according  as  the  logarithmic  base  is  2,  10,  c 
In  fact,  equivocation  is  almost  identical  with  information, 
difference  being  one  of  point  of  view.  In  information  we  £ 
the  notion  of  how  much  freedom  we  have  in  choosing  one  eler 
from  a  set  with  certain  probabilities — in  equivocation  we  t 
size  the  uncertainty  of  our  knowledge  of  what  wss  chosen  wt 
probabilities  have  certain  values. 


Although  any  one  number  can  hardly  be  expected  tc 
cribe  the  set  PE(M)  perfectly  for  all  purposes,  I  think  the 
defined  here  does  as  well  as  any  single  statistic  can*  Sor. 
the  theorems  which  follow  indicate  the  mathematical  "naturt 
of  this  particular  measure. 

. 

The  values  of  equivocation  for  the  Caesar  type  c: 
gram  considered  above  have  been  calculated  and  are  given  ir 
last  row  of  Table  1.  This  is  the  Q,  for  both  key  and  messaf 
the  two  being  equal  in  this  case. 

The  definitions  given  above  involve 'a  particular 
cepted  E,  and  ore  the  equivocations  for  that  intercepted  c: 
gram.    We  wish,  however,  to  find  a  measure  of  the  equivocf 
for  the  system  as  a  whole,  which  will  describe  this  progre: 
toward  uniqueness  as  N  increases  in  an  average  sort  of  way. 
To  do  this  we  form  a  weighted  average  of  the  equivocations 
each  particular  intercepted  message  E,  weighting  in  accord; 
with  the  probabilities  of  getting  the  E  in  question.  This 
be  called  the  mean  equivocation  of  the  system,  or  where  ttu 
is  no  chance  of  confusion  with  the  narrower  equivocation  fc 
particular  E,  we  abbreviate  to  merely  the  equivocation.  T: 
mean  equivocation  of  message  is 

Q(M)  -  -    T    P(E)  Pe(M)  log  Pe(M) 
/  M,E 

v 

the  summation  being  over  all  M  and  all  E.  Since 

P(E)  Pg(M)  -  P(E,  M) 

the  probability  of  getting  both  E  and  M,  we  can  write  this 

PM(E) 


Q(M)  -  -  T  P(M,E)  log  PE(M)  -  -  2  P(M,E)  log  P(M) 


P(E) 
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Similarly 

Q(K)  -  -  Z  P(K,E)  log  P(K)  -f—  . 


Either  of  these  mean  equivocations  is  a  theoretics 
measure  of  the  secrecy  value  of  the  system.  We  ssy  theoreti 
since  even  when  the  equivocation  is  zero,  which  corresponds 
no  uncertainty  as  to  the  message ,  it  may  require. e  tremendou. 
amount  of  labor  to  locate  the  particular  message  where  the  p 
bility  is  one.  It  might,  for  example,  be  necessary  to  try  e 
possible  K  in  succession  until  one  was  found  that  trensforme 
the  intercepted  E  into  reasonable  text  in  the  language.  Thu 
system  would  be  practically  very  good,  but  theoretically  sol 
The  equivocation  may  be  said  to  measure  the  degree  of  secrec 
when  the  cryptanalyst  has  unlimited  time  and  energy. 

The  equivocation  is,  of  course,  a  function  of  N,  t 
number  of  letters  intercepted.  The  functions  Q(K,N)  and  Q,(M 
will  be  called  the  equivocation  characteristic*  of  the  syste. 

Th3  following  data  will  be  helpful  in  forming  a  pi 
of  what  small  values  of  equivocation  represent. 

An  equivocation  of  .1  alternative  would  result  if 
9  times  in  10  there  was  no  uncertainty  as  to  M,  the  tenth  ti: 
two  M*s  were  equally  probable,  or  (2)  if  every  time  there  we 
two  possibilities  one  with  probability  .983,  the  other  with 
probability  .017,  or  (3)  if  99  times  in  100  there  W3S  no  unc 
tainty,  the  100th  tine  1000  equally  likely  possibilities. 

An  equivocation  of  ,01  would  result  <1)  if  every  t 
there  were  two  possibilities  one  with  probability  .999,  the 
with  probability  .001,  or  (2)  if  99  times  in  100  there  is  no 
certainty,  the  other  time  two  equally  likely  possibilities,  ; 
(3)  if  999  times  in  1000  there  is  no  uncertainty,  the  other  t: 
6  or  7  equally  likely  possibilities* 

*   ■  v  -.■■-* 

-  -  '*  x 

20,    Properties  of ^Equivocation 

Equivocation  may  be  shown  to  have  a  number  of  inte: 
esting  properties*  most  of  which  fit  Into  our  intuitive  pict 
of  how  such  a  quantity  should  behave*  We  may  first  show,  by 
example,  the  somewhat  surprising  fact,  that  after  a  cryptena. 
has  intercepted  certain  special- 'E*a,  his  equivocation  as  to  ! 
or  message  may  be  greater  then  before  he  intercepted  anythin, 
The  Intercepted  material  has  increased  his  ignorance  of  what 
happenedl  Suppose  there  are  only  two  messages  and  Mg  wit; 
a  priori  probabilities  p  end  qf  and  that  a  simple  substituti 
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is  used  according  to  the  following  table,  the  two  keys  K±  and  K2 
also  having  the  e_  priori  probabilities- p  and  q. 


Kl 

K2 

E2 

El 

M2 

E2 

Before  the  interception,  the  equivocation  of  both  key  and  message 
is  -  (p  log  p  ♦  q  log  q),  which  is  less  than  one  alternative  if 
p  4  q.    If  p  »  q  there  is  little  uncertainty  as  to  which  message 
and  key  will  be  chosen,  Mi  and  Now  suppose  he  intercepts 

The  a  posteriori  probabilities  of  both  keys  and  both  messages  are 
easiTy  seen  to  be  l/Z.  and  hence  the  equivocation  for  both  key 
and  message  is  one  alternative,  greater  than  before.'   On  the  other 
hand,  if  Eg  is  intercepted,  the  more  probable  event,  the  equivo- 
cation for  both  key  and  message  decreases,  more  than  enough  to 
compensate  for  the  other  increase,  and  the  mean  equivocation  of 
both  key  and  message  decreases.    This  is  a  general  property  of  all 
secrecy  systems. 

The  mean  equivocation  of  key,  Qk(n)  iB  a  non-increas- 
ing function  of  N.    The  mean  equivocation  of  the 
first  A  letters  of  the  message  is  a  non-increasing 
function  of  the  number  N  which  have  been  intercepted. 
If  N  letters  have  been  intercepted,  the  equivocation 
of  the  first  N  letters  of  message  is  less  than  or 
equal  to  that  of  the  key.    These  may  be  written 


Theorem  12: 


Qm(m)  <  Qm(N) 
Qu(N)  < 


S  >  N 
M  >  N 


The  qualification  regarding  A  letters  in  the  second 
result  of  the  theorem  is  so  that  the  equivocation  will  not  be 
calculated  with  respect  to  the  amount  of  message  that  has  been 
intercepted^    If  it  iB;  the  message  equivocation  may  lend  usually 
does)  increase  for  a  timej  due  merely  to  the  fact  that  more 
letters  stand  for  a  larger  possible  range  of  messages*  The 
results  of  the  theorem  are  what  we  might  hope  from  a  good  measure 
of  equivocation,  since  we  would  hardly  expect  to  be  worse  off  on 
the  average  after  intercepting  material  than  before-.    The  fact 
that  they  can  be  proved  gives  additional  justification  to  our 
definition* 
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The  results  of  this  theorem  can  be  proved  by  a  sub- 
stitution in  the  property  6  of  section  1»    Thus  to  prove  the 
first  or  second  we  have  for  any  chance  events  A  and  B 

Q,(B)  >  QA(B) 

If  we  identify  B  with  the  key  (knowing  the  first  S  letters  of 
cryptogram)  and  A  with  the  remaining  N  -  S  letters  we  obtain 
the  first  result.    Similarly  identifying  B  with  the  message 
gives  the  second  result.    The  last  result  follows  from 

Q(M)  <  Q(K)  *  Qg(M)     .  \ 

and  the  fact  that  QK(M)  *  0  since  K  uniquely  determines  M. 

Theorem  13:    Q,(K)  -  JM|  ~  }E|  +  jK| 

Q(M)  «  fM |  -  |E|.+  |Hf 

where 


-  -    I    P(M,E)  log  . 
M,E 


We  have 


q(k)  -  -  r 

E,K 


P(K)  PK(E) 
P(E) 


Hence 


'Q(K)  -  -  2  P(K)  PK(E)  log  P{K)  -  r  P(K)  Pk(E)  log,  PKfE) 


,  +  r  P(K)  PKiE)  log  P(E) 


Summing  the  first  term  on  E  gives  -  1  P{K)  log  P(K)  ~ 

In  the  second  term  PviE)  is  P(M)t  the  unique  M  that  gives  E 

with  key  K.  Summing  on  K  then  gives  -  T  P(M)  log  P(M)  -  |M|. 
The  third  term  is  2  P(E)  log  P(E)  -  |EU 
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The.  second  equation  in  the  theorem  is  proved  by  the 
same  method. 

Q(M)  -  -  Z  P(E)  Pe(M)  log  Pe(M) 

-  -  I  ?(«)  *(»  log  F(M) 

P(EJ 

«  -  Z  ?(M)  FM(E)  log  P(M)  -  Z  P(K)  Pm(E).  log  PM(E) 
'   +  Z  P(M)  PM(E)  log  P(E)  ': 

-  |M|  -  |S|  -  T  P(M)  PM(E)  log,  Pm(EJ  ' 

The  last  term  here  aay.be  interpreted  as  follows*    Group  to- 
gether 811  the  different  keys  that  transform  a  fixed  M  into 
the  same  E,  giving  the  total  probability  to  the  group,  which  -v. 
will  be  %(E) .    The  last  term  is  the  average  size  of  this  group 
space  weighted  according  to  the  probability  P(M)  of  choosing 
among  the  groups  leading  out  of  M.    In  case  no  group  contains 
more  than  one  element  (at  any  rate  no  group  from  a  M  with 
P(M)  >  0  then  |H|  *  |K|  and  q(K)  -  Q,(M) .    This  is  also  clear 
since  there  is  then  a  one-to-one  correspondence  between  the 
keys  and  messages  for  any  given  E. 

From  the  first  equation  of  the  theorem  we  may  conclude 
that  Q(K)  -  |K|  in  case  |M|  -  fEj .    This  latter  occurs  in  par- 
ticular if  all  L''s  ere  equally  likely  and  all  E»s  equally  likely 
and  there  are  the  Same  number  of  each.    It  is  easy  to  see  that 
this  is  the  case  with  a  language  in  which  every  letter  is  equally 
likely  and  independent,  ond  when  almost  any  of  the  simple  ciphers 
are  used. 

If  we  have  a  product  system  S  s  T  R,  it  is  to  be  ex- 
pected that  the  second  enciphering  process  does  not  decrease 
the  equivocation  of  message  and  thiq  Is  actually/true  as  C8n 
be  shown  by  the  methods  used  /above*    If  T  end  R  commute  either 
may  be  considered  as  being  the  first  and  hence  in  this"  case  . 
the  equivocation  with  S  is  not  less  than  the' maximum  for  the, 
two  systems  R  and  T,    Simple  examples' show  that  this  does  not  ' 
hold  necessarily  if  R  and  T  do"  not  commute,  \\ 

Theorem  14;    The  equivocation  in  message  of  a  product 
system  S  »  T  R  is  not  less  than  that  when 
only  R  is  used.     If  T  R  -  R  T  it  is  not  less 
than  the  maximum  of  those  for  R  and  T  alone. 
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If  we  hove  a  product  of  several  systems  R  S  T  U,  we 
con  of  course  extend  this,  to  sey  that  the  equivocation  of 
R  S  T  U  is  not  less  than  that  of  S  T  U,  which  is  not  less  than 
that  for  T  U,  etc 

There  is  no  similar  theorer.:  for  the  inner  product  since 
for  example  if  T  and  R  are  inverse  processes  their  inner  product 
is  the  identity  and  the  resulting  equivocation  zero. 

Suppose  we  have  a  system  T  which  can  be  written  as  a 
weighted  sum  of  several  systems  R,  S,  U 

T  -  pxR  +  PgS  +  ♦     +  PmU       I  Pi  -  1 

1  .\-  -  ■ 

and  that  systems  R,  S,  U  have  equivocation  characteristics 

Qi,  Qe  %l*  •       .         '         ■    ;'  ' 

Theorem  15:    The  equivocation  Q  of  a  weighted  sum  of 
systems  is  bounded  by  the  inequalities 
2  PiQi  <  Q  <  2  PiQi  -  I  Pi  log  Pi 

These  are  best  limits  possible.    The  Q»s  may  refer  either  to 
key  or  to  message,  . 

The  upper  limit  is  achieved,  for  example,  in  strongly 
ideal  systems  (to  be  described  later)  where  the  decomposition 
is  into  the  simple  transformations  of  the  system.    The  lower 
limit  is  achieved  if  ell  the  systems  R,  S,  ..t)  U  go  to  com- 
pletely different  cryptogram  spaces.    This  theorem  is  also  proved 
by  the  general  inequalities  governing  equivocation, 

QA(B)  <  Q(B)  <  Q(A)  ♦  QA(B). 

We  Identify  A  with  the  particular  system  being  used  and  B  with 
the  key  or  message,  • 

There  Is  a  similar  theorem  for  weighted  sums  of 
languages,  ■  v  "■ 

Theorem  16:    Suppose  a  system  can  be  applied  to  lenguages 
•  ,  ••*      ^i#  L2».  •♦•>  Lm  Qn<l  has  equivocation  cha,rac- 

,   teristics  Q^.*  Q-2»  ^m*    When  °PPlied  t0 

the  weighted  sum  ?  Pi  Li,  the  equivocation  Q, 
is  bounded  by 

2  Pi  Qi  £  Q  £  1  Pi^i  "  1  Pi  log  pi 
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These  limits  are  the  best  possible  end  the  equivocations  i 
question  can  be  either  for  key  or  message. 

The  proof  here  is  essentially  the  'same  as  for  th 
preceding  case. 

An  important  consequence  of  the  result 
Q(K)  «  iKf  +  |Ml  -  JE| 

is  the  following,' 

,      .  ..«'.      *~  • 

Theorem  17;*  In  any  closed  system,  or  any  system  where 

-. <. "  the  total  number  of  possible  cryptograms  is 
.    '              ; equal, to  the  number  of  possible  messages" 

•  of  N  letters  Q(K)  >  \K]  -  <  fM0 1  -  }M|) •*  |K]  • 

'L v  *  i     "    :  where  M0  »  log  H,  with  H  the  number  of  pos- 
-  -        ,   ' ::  ■>-.■.'•'.;-.     sible  messages  of  N  letters."  Dm  is  the  total 

redundancy  for  N  letters,' 

This  is  true  since  |M0 |  >  [Ef,  the  equality  hold 
only  if  all  cryptograms  are  equally  likely.1   The  theorem  s 
that  in  a  closed  system  the  key  is  determined  only  by  the 
dundancy  of  the  language  -  the  equivocation  can  decrease  o 
es  the  redundancy  comes  into  action  and  at  no  greater  rate 

Suppose  we  have  c  pure  system  and  let  the  differ 
residue  clesses  of  nassoges  be  Ci.,  C%r  Cr,    The  co 

ponding  set  of  residue  classes  of  cryptograms  is  C^,.. 

The  probability  of  each  E  in       is  the  sane:  ; 
'    Where       is  the  number' of  different  messages  in  Thus  ; 

:    ,        -  «-z  p(Ci)  log'  -  ' 


P(E)  «  2i££i  E  e  C, 
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Substituting  in  our  equation  for  Q,  we  obtain: 
Theorem  18:    For  a  pure  cipher 

Q  -  \K\  +  (Hj  ♦  I  P(Ci)  log 

This  result  can  be  used  to  compute  Q,  in  many  cases  of  inte 

From  the  analytic  point  of  view  pure  ciphers  hcv 
simple  structure.  If  a  cryptogram  is  intercepted  its  resi 
class  gives  the  complete  information  obtained  by  the  crypt 
Within  the  residue  class  the  system  is  perfect  -  each  mess 
in  the  class  has  an  a  posteriori  probability  equal  to  its 
a  priori  probability?  For  large  N.  beyond  the  unicity  poi 
There  will  usually  only  be  one  M  in  the  class  of  reasonabl 
probability.,  and  the  -problem  is  to  determine  this  M. 

The  theorem  oh  equivocation  of  pure'  ciphers  can  : 
altered  to  show  this.    We  have 

iptCi)  log  ZllLL  «  z  p(ci)  log  p(ci)  -i  p(Ci)  log  ^- 
<?i  V1 

+  Z  ViCi)  log  k 

-  Z  PtCiJ  log  P(Ci)  +  QM(K)  -  |K| 


Hence 


end 


P(C<  ) 

Q  (K)    -  |K|  +  |M|  +  Z  P{C,  )  log   i- 

"  |*|  ♦  QM(K)  +  I  P(Ci)  log  P(Ci) 

Q  <M)  '■'  -  |M|  -  [-Z  P(Ct)  log  HCil  1 


The  equivocation  of  message  is  the  equivocation  of  message 
the  cryptogram  was  intercepted  less  the  information  imparte 
specification  of  its  residue  class,     ;        .  *      " :  ■ 


SI.    Key  Appearance  Characteristic 

Suppose  the  cryptanalyst  has  N  letters  of  message 
and  N  letters  of  the  equivalent  cryptogram.    Then  he  can  ca3 
cul.ate  the  a  posteriori  probabilities  of  the  various  keys  or 
the  basis  of  this  information,  and  if  N  is  small  there  will 
remain  a  certain  equivocation  of  key*    For  example  in  simple 
substitution,  knowing  20  letters  of  message  and  cryptogram 
does  not  disclose  the  entire  key,  since  only  about  12  letter 
of  the  26  will  be  represented, •  Thus  there  is  a  residual 
equivocation  of  log  (26-12);,  if  exactly  12  letters  appear. 
We  define  the  mean  residual  key  equivocation  as 

*•• 

.   ,  /     :  .     ••  „•• ;  ,r;-:" 

when  P(E,M)  is  the  a  priori  probability  of  having  message  M 
and  cryptogram  E,  and  Pg^fK)  is  the  conditional  probability 
of  K  with  S  and  M  given* 

This  may  be  written  by  obvious  arguments  (assuming 
all  keys  equally  likely) 

%(K)-  %    P(M,K)  log  X  (M,K) 

where  X  (M,K)  is  the  number  of  different  keys  from  M  in  para 
with  K,  that  is  which  go  to  the  same  E  as  K. 

For  simple  substitution  let  P*  be  the  probability 
that  a  received  cryptogram  of  N  letters  has  X  different  lett 
appearing  in  it.  Then 


%(K)  *  £  Px  log  (26  -  x)j 


Approximately 


log  lbgV^26A) 

,  r 

The  bracketed  terms  vary  slowly  wifcfc atfd  it  P&)  is  fairly 
well  concentrated,  we  may  take  the  bracket'  out"  replacing  X 
by  its  mean  value  Xjv   This  gives,-  after  recombination 
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QM(K)  »  log  (26  - 

This  residual  key  equivocation  is  shown  for  simple  substi- 
tution on  English  in  Fig;  12,    It  measures  how  much  of  the 
key  has  not  been  used  in  enciphering  N  letters  of  text  on 
the  average, 

Theorem  19:       QjX)  -  Q(M)  ♦  ft^K) 

That  is,  the  total  key  equivocation  (when  we  don't  know  the 
message)  is  the  sum  of  the  message  equivocation  and  the  re- 
sidual key  equivocation;  lie;;  the  equivocation  there  would 
be  in  the  key  if  we  did  know  the  message;    This  follows  from  • 
the  fact  that  the  key  uniquely  determines  the  message 
properties  4  and  5  in  Section  X»   ■      *  . 

22.    Equivocation  for  Simple  Substitution  on  an  Independent 
.,      tetter  Language .     •  ■ 

We  will  now  calculate  the  mean  equivocation  in  key 
or  message  when  simple  substitution  is  applied  to  a  two 
letter  language,  probabilities  p  and  q  for  0  and  1,  with 
successive  letters  independent;    We  have 

%  "  %  "  -2PE  PJSlK)  log  PSlK) 

The  probability  that  E  contains  exactly  s  O's  in  a  particular 
permutation  is 


1  ,  s  nN-s  .  s  N-s, 
g-  (P    q  •     ♦  0.    P  ) 


and  the  a  posteriori  probabilities  of  the  identity  and  in- 
king substitutions  are  respectively 


ver  ting 


pa  q»"»  p1^8  q9 


hM  m  177^  ♦  ,8  p^8)  V? *  EFT*  ♦  >*; 

■ 

There  are  („)  terms  for  each  8  and  hence 
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This  may  be  written 

Q(N)  =  -Z  pS  q^3  [s  log  p  +  (N-s)  log  qj 

,       /  s    N— s    s    N-s  i 
-  log  (pa  q  p^a) 


-  -N  [p  log  p  *  q  log  q]  *■  Z  (*)  pS  q1^8  log  (pS  qlN"s  q£ 
«  MR  +  iz  <N)   (pS  qN~S  *  qS  p1*"3)  log  (pS  qN-s  *  qS  p1^ 


For  p  =  1/3,  q  =  2/3,  and  for  p  *  1/8,  q  -  7/8,  Q,  has  beer 
culated  and  is  shown  in  Fig.  13, 


Now  assume  the  language  contains  r  different 
letters  chosen  independently  and  with  probabilities  p, , 
p£****»  pr*    By  approximately  the  same  argument  we  have 


1  2  T>  "l 

Q(N)  -  -Z  {sx...8T)  px      p2      ..*pr  r  log  -r± 


Sl  ! 

3P.  S*  _  Pi  "»Pr 


Sl  f 

Zp  •••PT1 

s,  ...  sr  a  r\ 

±  T  p 

where  Z  s.  »  N  and  Z  is  over  all  permutations  of  1,  8,  ... 
for  a,  tw  v 

Hence,  by  obvious • transformations 


Q(N)      m  *  £     Z      Ur5UjJ  2  Pa^.t.P^32,  log  Z  PaSl.... 


31*"  *3r 


P  '  P 


where  R  -  -£  p^^  log  p, ,  .  In  particular, 

QIO)  -  ±  ri  log  r|  -  log  r:  -  JkI 


3(1)  =  R  ♦  pj-  r  log  <r-l): 

*»  R  +  log  (r-l')l 


This  checks  the  evident  answer  for  3(1)  -  the  f: 
symbol  has  equivocation  R  and  the  parts  of  the  key  not  us* 
add  log  (r-lJI 


23.    The  Equivocation  Characteristic  for  a  "Random"  Closec 
Cipher  >  [  

- 

In  the  preceding  section  we  have  calculated  the 
equivocation  characteristic  for  a  simple  substitution  appi 
to  an  independent  letter  language-    This  is  about  the  simj 
type  of  cipher  and  the  simplest  language  structure  possibl 
yet  already  the  formulas  are  so  involved  as  to  be  nearly 
useless.    What  are  we  to  do  with  cases  of  practical  intere 
^  .  say  the  involved  transformations  of  a  fractional  transpose 
tion  system  applied  to  English  with  its  extremely  complex 
statistical  structure?    This  complexity- itself  suggests  tfc 
method  of  approach*    Sufficiently  complicated  problems  can 
frequently  be  solved  statistically,  \  In  order  to  do  this  y 
define  the  notion  of  a  "random"  cipher..  ^ 

■ 

We  suppose  that  the  possible  messages  of  length 
can  be  divided  into  two  groups,  one  group  of  high  and  fair 
uniform  probability,  while  the  total  probability  in  the 
second  group  is  small.    This  is  usually  possible  in  inform 
tion  theory  if  the  messages  have  any  reasonable  length.  I 
the  total  number  of  messages  be 

H  »  2  0 

where  R  is  the  maximum  rate  and  N  the  number  of  letters-, 
high  probability  group  will  contain  about 

RN 

3  =  2 

where  R  is  the  statistical  rate. 

The  deciphering  operation  defin&s  a  function  M~  i 
which  can  be  thought  of  as  a  series  of  lines,  k  for  each  E 
going  back  to  various  M' s.    By  a  random  cipher  we  will  mear 
one  in  which  all  keys  are  equally  likely  and  the  k  lines 
from  any  E  go  back  to  random  M»s..    The  equivocation' in  key 
is  given  by  -  -  '  1  " 

Q(K)  -  2  P(E)  PE(K)  log  PE(K) 

The  probability  of  exactly  m  lines  going  back 
to  the  high  probability  group  is 
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(k)    (s)m   n  s)k'm 
(m)     (IT)      11  "  I) 

If  a  cryptogram  with  m  lines  going  to  high  probability  mes- 
sages is  intercepted,  the  equivocation  is  log  m.     The  prob: 
ity  of  intercepting  such  a  cryptogram  is  easily  seen  to  be 
mH 
Sic  ' 


Hence  the  mean  equivocation  is 

■  *  ■  &  A  ui  ill*  (1-§,k"m  ■  l0s  »' 


We  wish  to  find  an  approximation^©  this  for  large  k.    If  t 

expected  value  of  m,  namely  m  *  §  k  is  »1,  the  variation  c 
log  m  over  the  range  where  the  binomial  distribution  assume 
large  values  will  be  small  and  we  oar*  replace  log  nf  by  log 
This  then  comes  out  of  the  summation  leaving  the  expected  e 
Hence  in  this  condition 


Q  -  log  |  k 

-  log  S  -  log  H  +  log  k 

-  Ik!  -  ImJ  +  1m  I 

-  IkI  -  N  D. 

If  m  is  small  compared  to  the  large  k,  the  binomial  distri- 
bution can  be  approximated  by  a  Poisson  distribution.* 

(k)    m    k-m     e"X  Xm     \  m  S  * 
lm)  ^    H  ml  a 


Hence 

Q  -  £  e     S    £r  m  log  m 

•*  2 

■ 

-X    co  *  m. 
-  e        £  ~r  lo€  (»♦!)' 


*Fry,  Probability  and  Its  Engineering  Uses,  p. 214, 
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When  we  write  (m  ♦  1)  for  m.  This  may.be  used  in  the  regi< 
where  X  is  near  unity.  For  X  «  1  the  only  important  term 
the  series  is  m  -  1;  omitting  the  others 

-X 

<}  «  e     \  log  S 

»  X  log  2 

-  2lKl  Z'm  log  2 

Thus  <i  IK)  starts  off  at  IkI  ,  and  decreases  line 
with  slope  -D  out  to  the  neighborhood  of  N»lKl/D.  After  a 
short  transition  region,  Q,  follows  an  exponential  witn  ha 
life"  distance  l/D  if  D  is  in  alternatives  per  letter.  If 
is  in  digits  per  letter  l/D  is  the  distance  for  a  decrease 
by  a  factor  of  10.  The  benavior  is  shown  in  Fig,  14  with 
the  approximating,  curves. 

By  a  similar  argument  given  in  the  appendix,  the 
equivocation  of  message  can  be  calculated.    It  is 


Q(M)  -  lid  1  *  BQN  for  B0N«  Q(K)*1kI-DN 

CUM)  -  Q,(K)  BQN»  <4(K) 

Q,(M)  -  %{K\  -  9  (N)      B.(N)  "  Q,(K) 

where  <p(N)  is  the  function  of  Fig.  14,  with  N  scale  reduce 
by  a  factor  of    D  .    Q(M)  rises  linearly  with  slope  B0  unt 

Ro 

this  line  interests  the  q(K)  line.  After  a  rounded  transl 
it  follows  Q(K)  down. 

Most  ciphers  have  an  equivocation  characteristic 
of  this  general  type,  approaching  zero  rather  sharply.  We 
wiU  call  the  number  of  letters  required  for  near  unicity 
solution  the  unicity  distance, 

24,.  Application  to  Standard  Ciphers. 

The  characteristic  derived  for  the  random  cipher 
may  be  expected  to  apply  approximately  in  many  cases,  pro- 
viaine  some  precautions  are  taken  and  certain  corrections 
are  mfde.    ThTmain  points  to  be  observed  are  the  f ollowin 

1.    We  assumed  in  deriving  the  random  characteristic 
that  the  possible  decipherments  of  a  cryptogram 
are  a  random  selection  from  the  possible  message 
This  is  not  true  in- actual  oases,  but  becomes  mc 
nearly  true  as  the  complexity  of  the  operations 
used  in  the  enciphering  process  and  the  complex! 
of  the  language  structure  increase.    The  more  cc 
'  plicated  the  type  pf  cipher,  the  more  it  should 
follow  the  random  characteristic.    In  the  case  c 
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a  transposition  cipher  it  is  clear  that  letter 
frequencies  are  preserved.     This  means  that  the 
possible  decipherments  are  chosen  from  a  more 
limited  group  -  not  the  entire  message  space  - 
and  the  formula  should  be  changed.    In  place  of 
R0  one  uses  Ri  the  rate  for  independent  letters 
but  with  the  regular  frequencies.    This  changes 
the  redundancy  from 

D  -  rq  -  r  *  .707  digits/letter 

Df  "  Rjl  -  R  *  •538  digits/letter 

and  the  equivocation  reduoes  more  slowly.  In 
some  other  cases  a  definite  tendency  toward  re- 
turning the  decipherments  to  high  probability 
messages  can  be  seen.    If  there  is  no  clear 
tendency  of  this  sort,  and  the  system  is  fairly 
complicated,  and  the  language  a- natural  one 
.  (with  its  very  complex  statistical  structure)  - 
then  it  Is  reasonable  to  make  the  random  cipher 
assumption. 

In  many  cases  the  key  does  not  all  appear  as 
soon  as  It  might.    For  example  in  simple  sub- 
stitution one  must  wait  for  a  long  time  to  find 
all  letters  of  the  alphabet  represented  in  the 
message  and  thus  deduce  the  complete  key.  The 
message  becomes  unique  long  before  this  point. 
Obviously  our  random  assumption  falls  down  in 
such  a  case,  since  all  the  different  keys  which 
differ  only  in  the  letters  not  yet  appearing 
lead  back  to  the  same  message,  and  are  not  ran- 
domly distributed.    This  error  is  easily  cor- 
rected by  the  use  of  the  key  appearanoe  character 
Istio.    One  uses  at  a  particular  N,  the  amount 
of  key  that  may  be  expected  at  that  point  in  the 
formula  for  , 

There  are  certain  "end  effects*1  due  to  the  defini 
starting  of  the  message  which  produce  a  discrepar 
from  the  random  characteristics.    If  we  take  a 
random  starting  point  in  English  text  the  first 
letter  (when  .we  do  not  observe  the  preceding 
lsttars)  hasa  possibility  of  being  any  letter  w: 


to 
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the  ordinary  letter  probabilities.    The  next 
letter  is  more  completely  specified  since  we 
then  have  digram  frequencies.    This  decrease 
in  choice  value  continues  for  some  time.  The 
effect  of  this  on  the  curve  is  that  the  straigh 
line  part  is  displaced,  and  approached  by  a 
curve  depending  on  how  much  the  statistical 
structure  of  the  language  is  spread  out  over 
adjacent  letters.    As  a  first  approximation 
the  curve  can  be  corrected  by  shifting  the  line 
•   over  to  the  half  redundancy  point  -  i.e.,  the 
number  of  letters  where  the  language  redundancy 
is  half  its  final  value* 

If  account  is  taken  of  these  three  effects,  rea 
sonable  estimates  of  the  equivocation  characteristic  and 
unicity  point  can  be  made.    The  calculation  can  be  done 
graphically  as  indicated  in  Figs.  15  and  16.    One  draws  t. 
key  appearance  characteristic  TKl  -  ^A^-)        *&•  total  r 
dundanoy  curve  ImJ  -ImI  {which  fa  usually  sufficiently 
well  represented  by  the  line'  NR)  ♦    The  difference  between 
these  out  to  the  neighborhood  of  their  intersection  is 
For  the  simple  substitution  the  characteristic  is  shown 
in  Fig.  17.    In  so  far  as  experimental  checks  could  be  ca. 
ried  out  they  fit  this  curve  very  well.    For  example,  the 
unicity  point,  at  about  27  letters,  oan  be  shown  experi- 
mentally to  lie  between  the  limits  22  and  30.    With  30  le 
one  nearly  always  has  a  unique  solution  to  a  cryptogram  o: 
this  type  and  with  22  it  is  usually  easy  to  find  a  number 
them. 

With  transposition  of  period  d,  the  unicity  poi. 
occurs  at  about  1.5  d  log  d/c.    This  also  checks  fairly  w 
experimentally*       Note  that  in  this  case  Q,  is  defined  on. 
for  integral  multiples  of  d.  ' 

With  the  Vigenere  the  unicity  point  will  occur  t 
about  2d  +  2  letters,  and  this  too  is  about  right.  The 
Vigenere  characteristic  with  the  same  key  size  as  simple  i 
stitution  will  be  approximately  as  shown  in  Fig.  3.8,  The 
Vigenere,  £layf air  and  Fractibnal  cases  are  more  likely  tc 
follow  the  theoretical  formulas  for  random  ciphers  than 
simple  substitution  and  transposition,.    The  reason  for  th: 
is  that  they  are  more  complex  and  give  better  .mixing  char- 
acteristics to  the  messages  on  which  they  operate* 

■--  ■  '     i  ' 

The  mixed  alphabet  Vigenere  (each  of  d  alphabet 
mixed  independently  and  used  sequentially)  has  a  key  size. 


'4i-  .. 
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IkI  -  d  log  26V-  26.3  d 

and  its  unicity  point  should  be  at  about  53  d  ♦  2  letters 

These  conclusions  can  also  be  put  to  a  rough  ex 
perimental  test  with  the  Caesar  type  cipher.  In  the  part 
cular  cryptogram  analyzed  in  Table  I,  section  19,  the  fun 
tion  QlN)  has  been  calculated  and  is  given  below,  togethe 
•with  the  values  for  a  random  cipher. 

N  .  0  ♦ 

Q  {observed)  1.41 
Q  (calculated)  1.41 

The  agreement  is  seen  to  be  quite  good,  especia 
when  we  remember  that  the  observed  9,  should  actually  be  t 
average  of  many  different  cryptograms,  and  that  D  for  the 
larger  values  of  ,M  is  only  roughly  estimated.  * 

It  appears  then  that  the  random  cipher  analysis 
can  be  used  to  estimate  equivocation  characteristics  and 
the  unicity  distance  for  the  ordinary  types  of  ciphers. 

25.    Solving  Systems  Using  Only  N-Gram  Structure.  , 

The  preceding  analysis  can  also  be  applied  to  c 
where  the  cryptanalyst  is  assumed  to  know  or  use  only  a 
limited  knowledge  of  the  structure  of  the  language.    If  n 
data  about  the  language  other  than  the  digram  frequencies 
is  used  in  solving  cryptograms  the  equivocation  curves  ma: 
be  computed,  using  for  the  redundancy  curve  that  obtained 
from  D„  alone.    This  curve  lies  below  the  curve  for  all  r< 
dundancy  and  the  unicity  point  will  therefore  be  moved  to 
a  larger  N.    Fig,  19  shows  the  Q  curves  for  simple  substi- 
tution on  normal  English  when  the  cryptanalyst  uses  only 
digram  structures.- 

26 *  .  Validity  of  a  Cryptogram  Solution. 

■  *  • 

The  equivocation  formulas  are  relevant  to  quest: 
which  sometimes  arise  in  cryptographio  work  regarding  the 
validity  of  an  alleged  solution  to  a  cryptogram..    In  the 
history  of  cryptography  one  finds  many  cryptograms,  or 
possible  cryptograms/  where  clever  analysts  have  found  a 
^solution*!*    It  involved,*  however,  sucty  a  complex  process 
the  material  was  'so  scanty,  that  the  question  arose  as  to 
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whether  the  cryptanalyst  had  "read  a  solution"  into  the 
cryptogram.    See  for  example  the  Bacon-Shakespeare  ciphers 
and  the  "Roger  Bacon"  manuscript.* 

In  general  we  may  say  that  if  a  proposed  system 
and  key  solves  a  system  for  a  length  of  material  considers 
greater  than  the  unicity  distance  the  solution  is  trust- 
worthy.   If  the  material  is  of  the  same  order  or  shorter 
;  _         than  the  unicity  distance  the  solution  is  highly  suspicioi 

Thifleffeot  of  redundancy  in  gradually  producing 
unique  solution  to  a  cipher  can  be  thought  of  in  another  \ 
which  is  helpful.    The  redundancy  is  essentially  a  series 
conditions  on  the  letters  of  the  message,  which  insure  tte 
it  be  statistically  reasonable.    These  consistency  conditi 
produce  corresponding  consistency  conditions  in  the  crypto 
gram.    The  key  gives  a  certain  amount  of  freedom  to  the 
cryptogram,  but  as  more  and  more  letters  are  intercepted, 
the  consistency  conditions  use  up  the  freedom  allowed  by  t 
key.    Eventually  there  is  only  one  message  and  key  which 
satisfy  all  the  conditions  and  we  have  a  unique  solution. 
In  the  random  cipher  the  consistency  conditions  are  in  a 
sense  "orthogonal"  to  the  "grain  of  the  key",  and  have  the 
full  effect  in  eliminating  messages  and  keys  as  rapidly  at 
possible.    This  is  the  usual  case.    However,  by  proper  de- 
sign it  is  possible  to  "line  up"  the  redundancy  of  the 
language  with  the  "grain  of  the  key"  in  such  a  way  that  tt. 
consistency  conditions  are  automatically  satisfied  and  Q, 
does  not  approach  zero.    These  "ideal"  systems  are  of  such 
a  nature  that  the  transformations  T.  all  induce  the  same 
probabilities  in  the  E  space.    Ideal  characteristics  are 
shown  in  Fig.  20. 

27.    Ideal  Secrecy  Systems. 

We  have  seen  that  *perf ect  secrecy  requires  an 
infinite  amount  of  key*    With  a  finite  key  size,  the  equiv 
cation  of  key  and  message  generally  approach  zero,  but  not 
necessarily  so*    In  fact  It  is  possible  for  Q(K)  to  remain 
constant  at  its  Initial,  value  IX).    Then,  ho  matter  how 
much  material  . is  intercepted,  there  is  not  a  unique  soluti 
but  many  of  comparable, probability.    We  will  define  an 
"ideal"  system  as  one  in  which  (UK)  and  Q(M)  do  not  approa 
zero  as-*  oo,     A  "strongly  ideal"  system  is  one  in  which 
Q(K)  .remains  constant  at  IKU 


*See  Fletcher  Pratt,  "Secret  and  Urgent" 
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An  example  is  a  simple  substitution  on  an  artifi 
language  in  which  all  letter  probabilities  are  the  same  and 
each  letter  independently  chosen.    It  is  clear  that  Q(K)  » 
and  Q(M)  rises  linearly  along  a  line  of  slope  Rq  until  it 
strikes  the  line  Q(K),  after  which  it  remains  constant  at 
this  value. 

With  natural  languages  it  is  in  general  possible 
to  approximate  the  ideal  characteristic  -  the  unicity  point 
can  be  made  to  occur  for  as  large  N  as  is  desired.  The 
complexity  of  the  system  needed  usually  goes  up  rapidly  as 
we  attempt  to  do  this,  however*.   It  is  not  always  possible 
to  actually  attain  the  ideal  characteristic  with  any. system 
of  finite  complexity*. 

To  approximate  the  ideal  equivocation,  one  may 
first  operate  on  the  message  with  a  transducer  which  reduce: 
to  the  normal  form  «  i.e.,  with  all  redundancies  removed. 
After  this  almost  any  simple  ciphering  system  -  substitutio: 
transposition,  Vigenere  etc*,  id  satisfactory*    The  more 
elaborate  the  transducer  and  the  nearer  the  output  is  to 
normal  form,  the  more  closely  will  the  secrecy  system  ap- 
proximate the  ideal  characteristic.    Theorem  20:    A  necessa: 
and  sufficient  condition  that  T  be  strongly  ideal  is  that 
for  any  two  keys  TT    -1T    -    is  a  moasure  preserving  trans- 

1  J 

formation  of  fi^  into  itself*  ' 

This  is  true  since  the  a  posteriori  probability 
of  each  key  is  equal  to  its  a  priori  probability  if  and  onl; 
if  this  condition  is  satisfied, 

28*    Examples  of  Ideal  Socrecy  Systems. 

Suppose  our  language  consists  of  n  sequence  of 
letters  all  chosen  independently  and  with  oqual  probability 
Then  the  redundancy  is  zero,  |M:ol  ■  |M"j ,  and  from  Theorem  11 

Q(K)  -  |K|.    We  obtain  the  result 


Theorem  21?    If  all  letters  aro  equally  likely  and  independc 
any  closed  oipher  is  strongly  ideal* 

The  equivocation  of  message  will  rise  along  the 
key  appearance  characteristic  |K|  -  which  will  usuall: 

approach  |k|,  although  in  some  casos  it  does' not*.  In  the 
cases  of  N-gram  substitution,, transposition',  Vigenere  and 
variations,  fractional,  otc,  wo  havo  strongly  ideal  system; 
for  this  simple  language  with  Q(M)  —  |K|  as  oo.. 
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If  the  letters  are  independent  but  are  not  all 
equally  probable,  the  transposition  cipher  characteristics 
remain  essentially  the  same.    The  asymptotic  equivocations 
of  both  key  and  message  are  clearly  IKl.    In  the  substitution 
cipher  they  will  be  less.    If  all  the  letter  probabilities  are 
different,  then  the  asymptotic  equivocations  of  both  key  and 
message  are  zero.    The  letters  can  all  eventually  be  de- 
termined by  frequency  count  (apart  from  certain  exceptional 
sequences  of  zero  measure)*    Suppose  now  that  there  are  ? 

letters  with  probabilities,  '    ,  . 

...  .  , 

PX  -  P2  <  P3  <  P4  -  P5  -  P6  <  P9 

In  this  case  we  cannot  separate  p,  from  pg  or  p4  p=  and  pfi 
from  each  other,  but  the  different  unequal  probability  groups 
can  be  eventually  separated. 

If  all  substitutions  are  a  priori  equally  likely, 
there  will  be  an  asymptotic  uncertainty  among 

■  ■• 

2i  x  3I 

equally  likely  (a  posteriori)  keys.    Hence,  the  symptotic  Q, 
be 


■  log  21  3: 


In  general  it  is  clear  that  the  asymptotic  equivocation  with 
a  substitution  where  the  different  substitutions  are  equally 
likely  is 

$m  (M)  ■        (K)  -  log  H 

vhere  H  Is  the  order  of  the  group  of  substitutions  on  the 
letter  probabilities  p^  ...  pfl  which  leave  this  set  invariant. 

More  generally  we  can  consider  an  arbitrary  pure 
sy  stem  T  and  a  pure  language  L, .  Suppose  that  T  operates  > 
only  "locally"  on  the  letters  of  U  in  the  sense  that  the  nth 
letter  of  cryptogram  depends  only  on  n  and  a  certain  finite 
number  of  the  letters  of  M  in  the  neighborhood  of  the  nth 
one:   ■  ■  -       '  itU-  -"*»-" 


ea  -  f  lK.njm^  m^,. . t.m^p)'. 


i 


Then  we  can  show  that  there  is  a  certain  subgroup  of  the  t 
formations    T^-1T    which  are  probability  preserving  in  the 

language  L.  In  the  limiting  cases  these  would  consist  of 
the  identity  or  of  the  whole  group  ™  -1™ 

Ti  V 

Theorem  B2:    Under  these  conditions  the  asymptotic  equivoc 
of  key  is  the  logarithm  of  the  order  of  this  subgroup  of 
.  measure  preserving  transformations. 

An  ideal  secTecy  system  suffers  from  a  number  01 
disadvantages. 
-  i  '■ '.. "  '*.  .        **  \  .. 

*••  1*  The  system  must  be'  closely  matched  to  the  langue 
This  requires  an  extensive  study  of  the  structur 
of  the  language  by  the  designer.  Also  a  change 
statistical  structure  or  a  selection  from  the  se 
of  possible  messages  as  in  the  case  of  probable 
words  (words  expected  in  this  particular  cryptog 
renders  the  system  vulnerable  to  analysis. 

2.  The  structure  of  natural  languages  is  extremely 
complicated,  and  this  reflects  in  a  complexity  c 
the  transformations  required  to  reduce  them  to 
the  normal  form.    Tbus  any  machine  to  perform  th 
operation  must  necessarily  be  quite  involved,  at 
least  in  the  direction  of  information  storage, 
since  a  "dictionary"  of  magnitude  greater  than 

•  that  of  an  ordinary  dictionary  is  to  be  expected 

3.  In  general,  reduction  of  a  natural  language  to  a 
normal  "form  introduces  a  bad  propagation  of  erro. 
characteristic.    Error  in  transmission  of  a  sing 
letter  produces  a  region  of  changes  near  it  of 
size  comparable  to  the  length  of  statistical 
effects  in  the  original  language,. 

£9*    Multiple  Substitute  Ideal  Systems. 

.  *        There  is  another  way  of  obtaining  ideal  or  nearl; 
,,  ideal  characteristics  using  multi-valued  secrecy  systems. 
Suppose  our  language  contains  only  three  letters  with  - 
probabilities  1/8,  3/8  and  4/8,  and  that  successive  letter: 
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in  a  message  are  chosen  independently.  Let  there  be  1  sub- 
stitute for  the  first  letter,  3  for  the  second  and  4  for 
the  third,  and  choose  at  random  among  the  possible  substi- 
tutes for  a  letter.  It  is  clear  that  this  system  is  ideal, 
If  the  different  probabilities  are  incommeasurabl'e,  we  canr 
exactly  achieve  the  ideal  behavior,  but  can  approximate  it, 
by  using  enough  substitutes,  as  closely  as  desired* 

If  the  language  is  more  complex,  with  transition 
probabilities,  this  general  method  can  still  be  used,  but  i 
becomes  more  involved*    Suppose  the  choice  of  a  letter  de- 
pends only  on  the  two  preceding  letters,  not  on  any  more 
remote  part  of  the  message.    The  transition  probabilities 
p,  (k)  completely  desoribe  the  statistical  structure  of  the 

language.    We  supply  substitutes  for  k  When  it  follows i,  J 
proportion  to  p^  1*1*    Of  all  our  m  substitutes  mp^tk) 

represent  k  after  the  pair  irJ,    As  before  one  chooses  from 
the  possible  substitutes  for  a  letter  at  random.    The  crypt 
gram  will  then  be  a  random  sequenoe  of  the  m  substitute 
letters 

As  an  example,  suppose  the  p^j)  are  the  only 
statistics  of  the  language  and  the  values  are  given  by 

iNJ      12  3 


1 
2 


.1  .3  ,6 
1 2  .5  ,3 
,9     .1  0 


With  10  substitutes  0,  1,  2,  ,,,,9  we  construct  a  substitu 
table  assigning  substitutes  (chosen  randomly)  in  proportion 
to  the  frequencies*    The  following  is  a  typical  key. 


i 
1 
I 
3 


L 


2 


7               0,5#6  1,2,3,4,8,9 

3,9  0,4,8 

j             .\         •   »  •  * 
0,1,2,3,5,6,7,8,9  4 


If  a  3  follows  a  E  in  the  message  we  substitute  one  of  0, 
for  it,  the  choice  being  random.    A  second  table  must  be  s< 
plied  for  the  first  letter  of  the  message,  corresponding  t 
unconditional  probabilities  of  the  three  letters,  • 

Although  of  theoretical  interest  it  is  doubtful 
whether  such  systems  would  be  of  much  use  practically  beca- 
.  of  their  complexity  and  message  expansion  in  ordinary  case 
However j,  the  first  approximation  to  such  systems,  matching 
letter  frequencies,  has  b$en  used  in  ciphers  and  is  standa; 
practice  in  codes  (where  one  matches  word  frequencies). 

30 .    Equivocation  Rate." 

■ ■  .<  We  now  return  briefly  to  cases  where  the  key  is 

not  finite,  but  is  supplied  constantly,  as  in  the  Vernam  s- 
and  the  running  key  cipher In  such  cases  we  may  define 
equivocation  "rates'*.    One  ©onsldere  the  equivocation  Q(N) 
of  the  message  when  N  letters  have  been  intercepted,  The 
equivocation  rate  for  the  message  Is  defined  as  the  limit 
(assuming  it  exists): 

Lim"  Q(N) 

N-oo         ~     Q  • 

The  rate  for  equivocation  of  key  would  be  defined  similarl; 
using  the  equivocation  in  the  part  of  the  key  that  has  beei 
used  only,  but  of  course  these  two  are  the  same.    There  art 
results  for  these  parameters  analagous  to  those  obtained 
with  finite  key  cases.    Let  R»  be  the  mean  rate  of  using 
key, 

■ 

Theorem  23: 

...  *  '■• 

Q*  <  R» 

In  case  the  equality  holds  we  have  the  analogue  of  ideal 
systems  where  the  complete  information  of  the  key  goes  intc 
equivocation.    If  R*  >  IB  the  rate  of  the-message  source, 
we  can  obtain  perfect  secreoy  -  In  fact  we  may  define  per- 
fect secrecy  as  the  case  in  which  Q*  *  H«  , 

In  the  random  pase  we  have  the  analogous  result 

V      -     R»      -    D,  • 

31,    Further  Remarks  on^  Equivocation  and^  Redundancy. 

We  have  taken  the  redundancy  of  "normal  English" 
to  be  about  ,7  digits  per  letter  of  50^  of  RQ.    This  is  on 


the  assumption  that  word  divisions  were  omitted.    It  is  at 
approximate  figure  based  on  statistical  structure  of  the 
order  of  lengths  of  perhaps  8  letters,  and  assumes  the  te?. 
to  be  of  an  ordinary  type,  such  as  newspaper  writing, 
literary  work,  etc.    Various  methods  of  calculating  re<- 
dundancy  have  been  devised  and  will  be  described  in  the 
memorandum  on  information  mentioned    in  the  intro- 
duction.   We  may  note  here  two  methods  of  roughly  estimati 
this  number  which  are  of  cryptographic  interest. 

A  running  key  cipher  is  a  Vernam  type  system  whe 
in  place  of  a  random  sequence  of  letters  the  key  is  a 
meaningful  text.    Now  it  is  known  that  running  key  ciphers 
can  usually  be  solved  uniquely.  .This  shows  that  English 
can  be  reduced  by  a  factor  of  two  to  one  and  implies  a 
redundancy  of  at  least  oOjfa.    This  figure  cannot , be  reduced 
very  much,  however,  for  a  number  of  reasons,  unless  long 
range  "meaning"  structure  of  English  .is  considered*  ,  . 

The  running  key  cipher  can  be  easily  improved  to 
lead  to  ciphering  systems  which  could  not  be  solved  withou 
the  key..    If  one  uses  in  place  of  one  English  text,  about 
4  different  texts  as  key,  adding  them  all  to  the  message, 
a  sufficient  amount  of  key  has  been  introduced  to  produce 
a  high  positive  equivocation  rate.    Another  method  would 
be  to  use  say  every  10th  letter  of  the  text  as  key.  The 
intermediate  letters  are  omitted  and  cannot  be  used  at  any 
other  point  of  the  message,     This  has  the  same  effect,  sine 
the  mean  rate  for  these  spaced  letters  must  be  over  .8  Ho. 

These  methods  might  be  useful  for  spies  or  diplor 
.   who  could  use  books  or  magazines  for  the  key  source. 

A  second  way  of  showing  the  high  redundancy  of 
English  is  to  delete  all  vowels  from  a  passage.    In.  general 
it  is  possible  to  fill  them  in  again  uniquely  and  .recover 
the  original,  without  knowing  it  in  advance.  ■  As  the  vowels 
constitute  about  40j£  of  the  text  this  jmta  a  limit  on  the 
redundancy. '  Aotually  there  is  considerable  redundancy  left 
the  various  letter  and  digram  frequencies  being  far  tram 
uniform,  c  '■•  .  ■   v  v,f  -  ~--:xm-. 

■    -  -        .  \  ■    ■•. -v   •    •  "•  • 

-  -  This  suggests  a  simple,, way  of  greatly  improving 

almost  any  simple  ciphering:  system  *  -  Jirst  delete  all  vowel 
or  as  much  of  the  message  ss  possible  without  running  the 
risk  of  multiple  solutions,  -and  than  encipher  the  residue. 
Since  this  reduces  the  redundancy  by  a  factor  of  perhaps 
3  or  4  to  1,  the  unicity~  point  will  be  moved  out  by  this 

■ 
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factor.    This  is  one  way  of  approaching  ideal  systems  - 
using  the  decipherer's  knowledge  of  English  as  part  of  the 
deciphering  system,  ****  w  WA  6Iie 

Two  extremes  of  redundancy  in  English  prose  are 
represented  by  Basic  English  and  Joyce's  "Einnegans  Wake", 
The  basic  English  vocabulary  consists  of  only  850  words 
and  a  rough  estimate  puts  the  redundancy  at  about  70*. 
A  cipher  applied  to  this  sort  of  text  would  rapidly  approa 
unicity.    Joyce,  on  the  other  hand,  would  be  relatively  ea 

ifJSfi*??^??*  'fl?aI1  red^ancy  is  disclosed  by  the  dif- 
ficulty in  filling  incorrectly  even  a  single  missing  lett, 
pom  "Jinnegan8: Wake" f    What  the  numerical  value  is,  would 
be  difficult  to  determine >  it  varies  widely  throughout  the 

COOK, 

■  -     :  *  .  '"'<-./* 
The  mathematical  extremes  of  redundancy,  0  and  1C 
can  be  constructed  in  artificial  languages.   .In  the  first 
we    have  e.g..  a  single  possible  message.  0  iden- 

tically and  QIK)  ih,  the  random  cipher  case  declines  as 
rapidly  as  possible  i.e..,  as  rapidly  as  ohe  sends  informa- 
tion on  the  system,,  v In  .the  other  extreme  all  letter  sequer 
are  equally  likely,  and  any  closed  ciphering  system  is  idee 

We  may  refer  here  to  a  memorandum  by  Nyquist 
(Enciphering-Effect  of  Redundancy  in  "Language,  May  30,  1944 
in  which  some  questions  of  the  type  we  are  considering  here 
are  discussed.  i*— 

32.    Distribution  of  Equivocation. 

A  more  complete  description  of  a  secrecy  system 
applied  to  a  language  than  is  afforded  by  the  equivocation 
characteristics  can  be  found  by  giving  the  distribution 
of  equivocation.    For  N  intercepted  letters  we  consider 
the  fraction  of  cryptograms  for  which  Q  (for  these  particu- 
lar E's,  not  the  mean  OJ  lies  between  certain  limits.  This 
gives  a  density  distribution  function  • 

.   P(Q,Nh  d^ 

f01,  ^^Probability  that,  for  N  letters  Q  lies  between  the 
limits  Q  and  Q  +  dft,  .  The  mean  equivocation  we  have  previous 
studied  is  the  mean  -of  ^this  distribution.  .; 


Q.dCi. 


The  function  P(Q,N),  can- be  thought  of  as  plottedalong  a 
third  dimension,  normal  .to  the  paper,  on  the  Q^N  plane.  If 
the  language  is  pure,  with  a  small  influence « range  (com- 
pared to  K)  and  the  cipher  is  pure  the  function  P(Q,N)  will 
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usually  be  a  ridge  in  this  plane  whose  highest  point  follows 
approximately  the  mean       at  least  until  near  the  unicLty 
point.  •  In  this  case,  or  when  the  conditions  are  nearly 
verified,  the  mean  Q  curve  gives  a  reasonably  complete  pictv 
of  the  system,  • 

On  the  other  hand,  if  the  language  is  not  pure, 
but  made  up  of  a  set  of  pure  components.. 

L  •   Z       %\  , 

■  '  '  ■  '• 

having  different  equivocation  curves  with  the  system,  say 
Qi.  Qj>,  ....  Q  then  the  total  Q  distribution  will  usually  be 
made  up  of  a  series  of  Ridges*  1  There  will  be  one  for  each  1 
weighted  in  accordance  with  its  p*y   The  mean,  equivocation 
characteristic  will  be  a  line  somSwhere  in  the  midst  of  thes 
ridges  and  may  not  give  a- very  complete  picture  of  the  sit- 
uation.   This  is  shown  in  Pig*  '21  #     ,«  ,  '  ~ 

A  similar  effect  occurs  if  the, system  is  not  pure 
but  made  up  of  several  systems  with  different  ft  curves. 
There  is  then  a  series  of  ridges  in  the  PU,N)  plot,  and 
the  mean  Q,  strikes  an  average  which ,may  lie  between  ridges 
and  be  a  very  improbable  value  of  Q,  for  a  particular  crypto- 
gram.   These  effects  are  illustrated  in  Fig.  -22. 

The  effect  of  mixing  pure  languages  which  are 
near  to  one  another  in  statistical  structure  is  to  increase 
the  width  of  the  ridge.     Near  the  unicity  point  this  tends 
to  raise  the  mean  equivocation,  since  equivocation  cannot 
become  negative  and  the  spreading  is  chiefly  in  the  positive 
direction.    We  expect  therefore,  that  in  this  region  the 
calculations  based  on  the  random  cipher  should  be  somewhat 
low. 


I 


-  89  - 


PART  III 

,  Practical  Secrecy 

33.    The  v.Tork  Characteristic 

After  the  unicity  point  has  been  passed  there  wil 
usually  be  a  unique  solution  to  the  cryptogram.  The  proble 
of  isolating  this  single  solution  of  high  probability  is  th- 
problem  of  cryptanalysis ..  In  the  region  before  the  unicity 
point  we  mav  say  that  the  problem  of  cryptanalysis  is  that 
isolating  all  the  possible  solutions  of  high  probability  (c 
pared  to  the  remainder)  and  determining  their  various  probe 
ities.  .  .  i        ...  /  **  -.'*  "      -  .  ... 

>.;  :;'7V--     -  . 
Although  it  is  always  possible  in.  principle,  to  de- 
f.    •  mine  these  solutions  <ty  trial  of  each  ^possible  key  for  e'xa; 

different  enciphering  systems  show  a  wide  variation  in  the  s 
of  work  required.    The  average  amount  of  work  to  determine 
key  for  a  cryptogram  of  N  letters- T"(N)  measured  say  in  man  . 
may  be  called  the  work  characteristic  of  the  system.  This 
averag.  is  taken  over  all  messages  and  all  keys  with  their  ; 
propriate  probabilities. 


;         ,  For  a  simple  substitution  on  Snglish  the  work  and 

equivocation  characteristics  would  be  somewhat  as  shown  in 
Fig..  23.-    The  dotted  portion  of  the  curve  is  where  there  ar 
numerous  possible  solutions  and  these  must  all  be  determine 
In  the  solid  portion  .after  the  unicity  point  only  one  solut. 
exists  in  general,  but  if  only  the  minimum  necessary  data  e 
given  a  gr^at  deal  of  work  must  be  done  to  isolate  it.  As 
more  material  is  used  thj  work  rapidly  decreases  toward  som 
asymptotic  value  -  where  the  additional  data  no  longer  redu-, 
the  labor.  , 

I      ,  This  is  the  work  characteristic  for  the  key.    It  : 

*  \         '.     clear  that  after  the  unicity  point  this  function  can  never  : 

•  *■  1  creese.    There  is  also  a  work  characteristic:  fdr  the  messag 

the  average  emount  of  work  to  determine  th;e;raessago  (or  all 
'  reasonable  messages)  .  .  This  will i,  ih  ordinary  cases ,  be  bel 
or  et  any  rate  not  far  above  the  work  characteristic  for  th 
key,  out  to  fairly  large  W.  since  generally  If  'the  key  is  d 
termined  it  is  easy  to  find  IS  by  the  deciphering  transformer 
For  very  largo  N,  howevdr,  this  function  will  incroa-se  due 
merely  to  the  lebor  of  deciphering  the  large  amount  of  inte: 
cepted  material.  .  - 
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Essentially  the  behavior  s^  ^>*^Mo, 
exnected  with  any  type  of  seer -c      y  quired,  however 

c.pproaches  zero.    The  seal ^ofv men  nou         *^       g>   _ven  ^ 
will  differ  greatly  with  diffor*nt  ^yp  Qr  cocipound 

th.  Q  curves  are  about  *gw.  ^  k5y  si2i3  would  have  a  muc 
Vigenere,  for  example,  with  th.  Sect/ristic.  *  good  practic: 
better  (U./nuoh  ^f^fttf"(H)curve  remains  sufficie: 
secrecy  system  is  one  l4t.rs  one  expects  to  transmit 

ly  high  out  to  the  number  of  ™  uctSaiiy  carrying  out 

with  the  key,  -to  g^tv^t  tStuch  an  extent  that  the  inform: 
the  solution,'  or  to  delay  it  to  su  i 

tion  is  obsolete.  *  •     •  . 

-V    ^•^wiUxan,ider>n  the  following  ^^Sb/^C?L- 
.  keeping  the*  Unction  fW^o,  -  ^^^type  of  "problem  as 
»    cllv  zero,  *  This  is  essential/  -  hfttle  of  wits.*. '  In  design- 
■    is  always'the  .case  when  we ^^g^ amount  of  work 
ing  a  goodr  cipher  we  must  m  ™         unougn  merely  to 

thf  ene**rnust  do  ^  t^;k  it.^  ^  **f         twullysis  work  - 
be  sure  none  01  tho  St.  nd.ra  iU  break  the  system 

we  must  show  thct  no  method  ^tev.r  f  Q$  m  ny  systems 

<    easily.    This    U 5l!tb3i  SS  known  methods  of  solutio: 
they  were  designed  to  resist  ai  w    fl;3tnod  which  applied  to 
but  had  r  structure  leading  to     n;*>  nr™      hfcVd  b3on  many 

disclosed  werknjssos  of  th„ir  own. 

-  -v  flasiKii  is  essentially  on 

in  a  field  .  •  . 

v.-  e„r«  that  a  system  which  is  not 

vife3*         1  -„-,-  -"*""*."  »tTh  »nrv  of  Games".,    The  s: 

te^'^^^  Neumann  ^^^^^Sr  cnl  crjptanalyst  can  be  th 
,.tlori  between  the  ciPner-/t?nfi    atructure;  a  zero-sum  two  p 
•  -  '  :  ^  'lt  ss^gome"  of  »  very feLT  'Lt  ^  "novas*.   The  < 
^  game  wi%.  comp^^^  Information,^  ana  jv.  cryptan: 

I  %.  Cign#chooses  a  system  for ^^^^-^^od-of  analysis 

is  informed  of. this  choic.  and  cno       ~        rjquired  to  bre 
.    -  The  "value"  of  the  P^.J  ^  "nathod  cll0Sjn...' 

r.  cryptogram  in  the  system  cy 


•(1)  *fe  can  study  the  possible  methods  of  solution  available 
to  tha  cryptanalyst  and  attempt  to  describe  them  in  suffici^-n' 
gen:.rc.l  t^rns  to  cover  iny  methods  h^  might  use.  fc'j  th^n  con- 
struct our  system  to  resist  this  "general"  method  of  solution. 
(2)  \U  may  construct  our  ciphers  in  such  a  way  that  breaking  i 
is  equivalent  to  (or  requires  at  some  point  in  the  process)  tl 
solution  of  some  problem  known  to  be  Laborious.  Thus,  if  we 
could  show  thf.t  solving  t  system  requires  at  least  as  much  wor 
as  solving  a  system  of  simultaneous  equations  in  a  largo  numb^ 
of  unknown,  of  a  complex  type,  then  we  will  have  e  lower  bounc 
of  sorts  for  the  work  characteristic.  '  . 

"i--  r  ■  •"'  .        •„•> ' 

The  next  three  sections  ore  aimed  at  these  general 
problems.    It  is  difficult  to  define  the  pertinent  ideas  in- 
volved with  sufficient  precision  to  obtain  results  in  the  forrr. 
of  mathematical  theorems/  but  it  is  believed  that  the  conclusi 
in  the  form  of  general  principles,  are  correct. 

34 . -   Generalities  on  the  Solution  of  Cryptograms  . 

After  the  unicity  distance  has  been  exceeded  in  intc 
cepted  materiel,  any  system  can  be  solved  in  principle  by  mor_- 
trying  each  possible  key  until  the  unique  solution  is  obtained 
i.e.,  a  deciphered  message  which  "makes  sense"  in  ~l*-r.  A  simpl 
calculation  shows  that  this  method  of  solution  (which  we  may  c 
complete  trial  nnd  error)  is  totally  impractical  except  when  t 
key  is  absurdly  smalTT 

Suppose,  for  example,  we  ht-vo  a  key  of  261  possibili 
or  about  26.3  digits,  the  samu  size  as  in  simple  substitution 
English.    This  is,  by  any  significant  measure,  a  small  key.  I 
can  be  written  on  a  sm?:ll  slip  of  paper,  or  memorized  in  a  few- 
minutes.    It  could  be  registered  on  27  switches  each  having  to; 
positions  or  on  68  two  position  switches'. 

Suppose  further,  to  give  the  cryptanalystl  every  poss- 
ible* advantage,  thtt  he  constructs  a  electronic  device  to  try 
keys  &t  the  rate,  of  one  each  microsecond  ( perhaps ^eutomati call' 
selecting  from  the~rosults  by  a  X2  test  for  statistical  signi-' 
fionnce).    He  nr:y  expect  to  reach  the  right  key  about  half  way 
through,  and  after  nn  elapsed  time  of  about  ->> 


2  x  60c  x  24  X  365  x  10 


26~             •  '     ' '  ->' 

—  -  r  -  3  x  X0X*  years 

<P  w  Ami.  «    TfiK  ~    mo  '/ 


ft 

In  other  words,  even  with  a  smtll  key  compl-te  trial 
and  error  will  nev^r  be  used  in  solving  cryptograms,  except  in 
the  trivial  case  where  the  key  is  extremely  small,  e.g.,  the 


caeser  with  only  26  possibilities,  or  1.4  digits.     The  tri 
snd  error  which  is  used  so  commonly  in  cryptograph";  is  of 
different  sort,  or  is  augmented  by  other  means.     If  one  he. 
secrecy  system  which  required  complete  trial  and  error  it 
be  extremely  safe.-   Such  a  system  would  result,  it  appears 
the  original  messages,  all  say  of  .1000  letters,  weru  a  ran 
selection  of  2  RN  from  the  set  of  all  2  RoN  sequences  of  1 
letters.    If  any  of  the  simple  ciphers  w«rc  applied  to  the 
it  seems  that  little  improvement  over  complete  trial  and  «. 
would  by  possible. 

The  methods  actually- used  often  involve  a  great 
-x.pt  trirl  and  error,  but  in  a  different  way-    First,  the  tr 
;,.;V '    _  '  progress  from  more  probable  to  less  probable  hypotheses,  a. 
*  second,,  each  trial  disposes  of  a  large  group  of  keys,. not 

%     ■         .    single  one.    Thus  the  key  space  may  be 'divided  into  say  10 
subsets,  each  containing  about  the  srjne  number  of  keys.  B. 
.  at  most  10  trials  on=  determines  which  subset  is  the  corrtsc 

one.    This  subset  is  then  divided  into  several  secondary  s 
sets  end  the  process  repeated..    Y/lth  the  same  key  size 
(K  •  261  -  2  x  102°)  we  would  expect  about  26  x  5  or  130  t: 
as  compared  to  1026  by  complete  trial  and  error.    The  poss: 
bility  of  choosing  the  most  likely  of  th~  subsets  first  fo 
test  would  improve  this  result  evefi  more.    If  the  division: 
were  into  two  compartments  (the  b^st  way)  only  90  trials  w. 
be  required.    Wiore;  s  compljt^  trie!  and  error  requires  tr: 
to  the  order  of  the  number  of  k-ys,  this  subdividing  trial 
and  error  requires  only  trials  to  th~  order  of  the  key  siz 
in  r.lternetives. 

This  remains  true  even  when  the  different  keys  h 
different  probabilities.    The  proper  procedure  then  to  min. 
the  expected  number  of  trials  is  to  divide  the  key  space  ix 
subsets  of  equiprobr bility ,    Yftien  the  proper  subset  is  det. 
t..   ,      "    .  mined,  this  is  again  subdivided  into  equi probability  subset 
;. :  If  this  process  can  bo  continued  the  number  of  trials  expec 

when  each  division  is  into  two  subsets  will  be 
*  *-  •  . 

r-v-.-" h- ki  •    -  ••  y' 

-  ■-»  •  *v.  ...  _  .         log  2  .  ,■  . 

?  yr'  *-  -r*v   .  v  jf  jfcch  test  has  S  possible  results  and  each  of  t 

fc         v;      corresponds  to  the  key  being  in  one  of  S  equiprobabilitf ~su 

rr^-. .then  .,  ,.  ....  lT^T.?^f 

t&ft-      ."■  •     1  |Vi  ■      ...    .  ' 

Vyr,.  -  •  *  •        •     •     n  -  ILL       ■  :  •       7  ,;  v..  - 

C-  \;.  '    -  .  '      log  S 


/ 


trials  will  bo  expected.  The  intuitive  aifnif icunco  of  thes^ 
results  should  be  noted.  In  %h4  two  compartment  tuSt  with 
jquiprobibility,  each  test  yields  one  altornr.tiVw  of  informa- 
tion to  the  key.  If  the  subsets  hcv^  very  different  prob- 
abilities as  in  testing  t.  single  key  in  complete  trial  and  er 
only  i  snail  amount  of  information  is  obtained  froa  th~  test. 
This  with  26:  equiproble  keys,  a  tost  of  on„  vields  only 
■ 
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or  about  10       alternatives  of  information.    Dividing  into  S 
equiprobability  subsets  m^ximiz^s  the  information  obtained  fr 
each  trial  at  log  S,  and  the  expected  nuriber  of  trials  is  the 
total  information  to  be  obtained,  that  is  th~  key  size,  divid 
by  this  amount , 

The  question  here  is  similar  to  various  coin  weigh- 
ing problems  th; t  he Vo  been  circulated  recently.    A  typical 
example  is  the  following:     It  is  known  that  one  coin  in  27  is 
counterfeit,  and  slightly  lighter  than  the  rest.    A  chemists 
balance  is  available  r,nd  the  counterfeit  coin  is  to  be  isolat 
by  a  series  of  weighings,    '"hi  t  is  thu  lee  st  number  of  weigh- 
ings to  do  this?     The  correct  answer  is  3,  obtained  by  first 
dividing  the  coins  into  three  groups  of  9  uach..    Two  Of  th-.se 
are  compered  on  the  b:  Irnce.     The  three  possible  rjsults  de- 
termine the  set  of  9  containing  the  counterfeit..    This  s^t  is 
then  divided  into  5  subsets  of  3  er.ch  and  the  process  continu 
The  set  of  coins  corresponds  to  th^  set  of  keys,  the  counturf 
coin  to  the  correct  key,  and  the  weighing  procedure  to  &  trial 
or  test. 

>. 

This  method  of  solution  is  feasible  only  if  the  key 
space  can  be  divided  into  e  small  number  of  subsets,  with  s 
simple  method  of  determining  to  which  subset  the  correct  key 
belongs..   Started  in  another  way.  It  is  possible  to  solve  for 
the  key  bit  by  bit..    One  does  not  need  to  assume  a  complete  kt 
in  order  to  apply  a  consistency  test  and  determine  if  the  as- 
sumption is  justified  -  an  assumption  on  a  "part  of  the  key 
(or  as  to  whether  the  key  is  in  some  large  section  of  the  key 
space)  can  bo  tested. 

This  is  one  of  the  greatest  weaknesses  of  most  ciph 
ing  systems.     For  example,  in  simple  substitution,  an  assumpt. 
on  e  single  letter  can  be  checked  against  its  frequency,  vari 
of  contact,  doubles  or  reversals,  etc..    In  determining  a  sing- 
letter  the  key  space  is  reduced  by  1.4  digits  from  th.  origin 


26.     The  same  effect  is  seen  in  all  th~  elementary  typos  of 
ciDhers.    In  the  VigenJr^,  th-  assumption  of  tvvo  or  thre^ 
letters  of  the  key  is  easily  chock-d  by  deciphering  at  other 
points  with  this  fragment  and  seeing  whether  clear  emerges* 
The  compound  Vigene'ro  is  much  butter  from  this  point  of  view, 
if  we  assume  a  fairly  large  number  of  component  periods,  pro- 
ducing a  repetition  rate  larger  than  will  be  intercepted. 
Her-j  as  many  key  letters  ere  used  in  enciphering  each  letter 
as  there  ere  periods  -  although  this  is  only  a  fraction  of  the 
entire  keyi  at  JLeast  e  fair  number  of  letters  must  be  assumed 
before  a  consistency,  check  can  be  applied* 
.  v  ••. *•> 

Our  first  conclusion  then,  regarding  practical  small 
key  cipher  design,  is  that  a  considerable  amount  of  key  should 
be  used'  in  enciphering  each  small  element  of  the  message. 

35.    Statistical  Uethods 

'    i  -       ,.     It  is  possible  to  solve  many  kinds  of  ciphers  by 
statistical  analysis.     Consider  again  simple  substitution. 
Tha  first  thing  a  cryptographer  do^s  with  an  intercepted 
cryptogram  is  to  make  a  frequency  count.     If  the  cryptogram 
contains  say  200  letters  it  is  safe  to  assume  that  few,  if 
any,  letters  are  out  of  their  frequency  groups,  this  being 
a  division  into  4  sets  of  well  defined  frequency  limits.  The 
log  of  the  number  of  keys  within  this  limitation  may  be 
calculated  as 

log  21  91  .9!  61  «=  14.28 

and  the  simple  frequency  count  thus  reduces  the  key  uncertainty 
by  12  digits,  a  tremendous  gain. 

■ 

In  general,  e  statistical  attack  proceeds  as  follows. 
A  certain  statistic  is  measured  on  the  intercepted  cryptogram 
2.     This  statistic  is  such  that  for  all  r easonable  K  it  assumes 
about  the  sane  value,  Sr,  the  value  depending  only  on  the  par- 
ti culnr" key  25^ that  wrs  used.    The  value  thus  obtained  serves 
to  limit  the  possible  keys»  to  those  which  would  give  values 
of  S  in  the  neighborhood  of  that  observed.  .A  statistic  whicb  , 
does  not  depend  on  K  or  which  varies  as  much  with  Mas  with  K 
is  not' of  velue  in  limiting"  K»    Thus  in  transposition  ciphers , 
the  frequency,  count  of  letters  gives  no  information  about  K  - 
every  K  loaves  tB^s*  statistic  the  sane.    Hence  one  can  make 
no  use  of  a  frequency  count  in  breaking  transposition  ciphers. 

Ilore  precisely  one  can  ascribe  a  "solving  power "  to 
c  given  statistic  S»     For  er.ch  valuu  of  S  there  will  be  a 
conditional  equivocation  of  the  key  Qg(K),  the  equivocation 


when  S  has  its  particular  value  and  that  is  all  that  is  kn 
concerning  the  key.     The  weighted  mean  of  these  values 


£P(S)  Qs(K) 

• 

gives    the  mean  equivocation  of  the  key  y  hen  S  is  known,  F 
being  the:  c  priori  probability  of  the  pcrticular  value  S. 
key  size  IK  I  less  this  aean  equivocation  measures  the  "sol- 
power"  of  S, 

;    >vpr      In  a  strongly  ideal  cipher  all  statistics  of  the 
togram  are  independent  of  the  particular  key  used.    This  i: 
the.  measure  preserving  property -of  TiTiZ-Von  the  a  space  o 
Tj-lTk  on  the     space  mentioned  abovS.  -~  • 


There  are  good  and  poor,  statist ic's,  just  as  ther 
good  and  poor  nethods  of  trial  and.  error.    Indeed  the  tri:.; 
error  testing  of  hypothesis  Jj  a  type  of  statistic,  i-nd  wh. 
yiB  said  above  regarding  the  .best  types  of  trials  holds  ge: 
-  "A  good  statistic  for  solving  a  system  must  have  th~  follow" 
properties: 

1.  It -must  bo  simple  to  measure. 

2.  It  nust  depend  more  on  the  key  then  on  the  nesse  t 
if  it  is  meant  to  solve  for  the  key.  The  veriati  c 
with  K  should  not  mask  its  vrriation  with  K. 

3.  The  values  of  the  statistic  that  can  be  "resolved' 
in  spite  of.  the  "fuzziness"  produced  by  variation 
in  II  should  divide  the  key  space  into  a  number  of 
subsets  of  comparable  probability,  with  the  static 
tic  specifying  the  one  in  which  the  correct  key 
lies.     The  statistic  should  give  us  sizable  infor- 

.    nation  about  the  key,,  not  a  tiny  fraction  of  an 
-       alternative.  .  •  '  -  -" 

-4*  ...The  infonaation.it  gives  nust  be  simple  and  usable 
."      •  .  -  :    Thus  the  subsets  In  which  t bo  statistic  locates  th 
v^key  rxust  be  of  .*L  simple  nature  in  ths^key  spuce. 

:'-  *>r< _  '  :iv '..  *' n^-ifHfcv''  .  -irfA  . 

,    Frequency  count  for  simple  substitution  is  an 
:  ,«$$opi£ uof  't.  very  good  statistics*  _  '     ^  ^Vv^:-. 

.    »  ..  _  ,^t.  ...  .  ..  .  - 


Two  methods  (other  tban >rocouris^'o:^i%enl'  systems 
suggest  themselves  for  frustrating  a  statistic^ analysis. 
These  we  mcy  cf 11  the  methods  of  diffusion  and  confusion, 
the  method  of  diffusion  th^  statistical  structure  of  R  whic: 
leads  to  its  redund:  ncy  is  "dissip;  ted"  into  long  range  st: 
-  i.e.,  into  statistic;!  structure  involving  long  coabinati 
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-  of  letters  in  the  cryptogram.    The  effect  here  is  that  the 
must  intercept  a  tremendous  amount  of  material  to  tie  down 
sturcture,  since  the  structure  is  evident  only  in  blocks  o: 
small  individual  probability.    Furthermore  even  when  he  har 
ficient  material,  the  analytical  work  required  is  much  gre? 
since  the  redundancy  has  been  diffused  over  a  large  number 
individual  statistics.    An  example  of  diffusion  of  statisti 
is  operating  on  a  message  m  -  mi,  m2,  m3  .....  with  a  "smoc 
ing"  operation,  e^g,  >v  , 

s 

'  vn  "s  mn+i  mod  26  ,    ■  -  - 

.      -  -V   -  •  i-1   '        •-r  ^K,-/V 

-  ,    ,  *  "         f  .  w    HurlfCf.  ■*■•■   •••  •  " "'        •  -    *        ■  1 

adding  s  successive  letters  of  the  message  to  get  a  letter 
^One  can  show  that  the  redundancy  of  the  y  sequence  is  the  s 
as  that  of  the  m  sequence,  but  the  structure  has  been  dissi 
Thus  the  letter  frequencies  in  y  will  be  more  nearly  equal 
«  in  m,  the  diagram  frequencies  also  mor3  nQapiyfaqual  etc, 

...  -     deed  any  reversible  operation  which  produces  -one  letter  out 

each  letter  in  and  does  not  have  an  infinite  "memory"  has  a. 
output  with  the  sams  redundancy  as  the  input.  The  statisti 
can  never  be  eliminated  without  comwession,  but  they  can  t 
spread  out*  • 

..r  .'  The  method  of  confusion  is  to  make  the  relation  t 

the  simple  statistics  of  3  and  the  simple  description  of  K 
complex  and  involvid  one.     In  the  case  of  simple  substituti 
was  easy  to  describe  the  limitation  of  K  imposed  by  the  let 
frequencies  of  3.     If  the  connection  is  very  involved  and  c 
fused  the  enemy  can  still  evaluute  a  statistic  Si  say  which 
the  key  to  a  region  of  the  key  space.    This  limitation,  how 
is  to  some  complex  region  R  in  the  soace  -  folded  over  many 
and  he  has  a  difficult  time  mr.king  use  of  it,    A  second  stc 
S2  limits  K  still  further  to  Rg,  hence  it  lies  in  the  inter, 
region  R1R2*  but  this  does  not  help  much  because  it  is  so  d; 
cult  to  determine  just  what 'the  intersection  is."  . 

i  ,  'v-v  To  be  more  precise  lot  us  .suppose  the  It ey  space  he 

oertcin  "natural  coordinates*  kl,k2,  "  . k-  which  he  .wishes 
terminey.    .He  measure's  c  set  of  -'stati sties  sijSg^^^s' anc 
ere  sufficients  to  determine  the  k^.    However,  in  the  method 
confusion,  th*  equations  connecting  thes a  sets  of  variables 
involved  and  complex.    We  have, :  s^y,  -: '•^•;':'r'a~-~ 


fn(k1,k2,,.;,ki>).- sn, 
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and  all  the  f.  Involve  all  the  k^.    The  cryptographer  must 

solve  this  system  simultaneously  -  a  difficult  job.     In  the 
simple "(not  confused)  cases  the  functions  involve  only  a 
small  number  of  the  k.  -  or  at  least  some  of  these  do*  One 
first  solves  the  simpler  equations,  evaluating  some  of  the 
ki  and  substitutes  these  in  the  more  complicated  equations. 

The  conclusion  here  is  that  for  a  good  ciphering 
system  steps  should  be  taken  either  to  diffuse  or  confuse 
the  redundancy  (or  both)-  /  /  . 

V  '>  ■  "  ■  -  "AV.  . 

36,    The  Probable  Word  Method,       .  -      '         _  ,       .  . 

One  of -the  most  powerful  tools  for- breaking  ciphers 
is  the  .  use  of  prQbable  words,.    The  probable  words  may-^.-J^.y 
words  or  phrases  expected  in  the  particular  message  flue,  tq  j"; 
its  source,  or  they  may  merely  be  common  words  or  syllables 
which  occur  in  any  text  in  the  language,  such  r.s  the;  end, 
tion,  thrt,  etc.."    v  i 

In  genera 1>  the  probable  word  method  is^used  as 
follows*    Assuming  a  probable  word  to  be  at  some  point  in 
the  cleT,  the  key  or  r  part  of  the  key  is  determined*  This 
is  used  to  decipher  other  pp. rts  of  the  cryptogram  and  provide 
r  consistency  test*    If  the  other  prr£s  come  out  in  clerr, 
the  resumption  is  justified. 

There  pre  few  of  the  classical  type  ciphers  that 
use  a  sm^ll  key  and  can  resist  long  under  a  probable  word 
analysis.    Fr^m  a  considerr  tion  of  this  method  v.e  can  frame 
a  test  of  ciphers  v.hich  might  be  called  the  r  e  id  test.  It 
applies  only  to  ciphers  with  a  small  key  (less  thr.n  say  50 
digits),  applied  to  natural  languages,  and  not  using  the 
ideal  method  of  gaining  secrecy.    The  rCid  test  is  this: 
Hoy.  difficult  is  it  to  determine  the  key  or  a  p^rt  of  the 
key  knowing  n  sample  of  message  rnd  corresponding  cryptogram? 
Any  system  in  v.hich  this  is  easy  cannot  be  very  resistant, 
for  the  cryptr.nrlyst  can  always  make  use  of  probable  words,- 
combined  with  trial  and  error,  Until  a  consistent  solution 
is  obtained- 

-  - .         '      v  •'         .'• ' ■   ■  . :     "  ri  - 

The  conditions.  r>n  the,  size  of,  the  k:y  make  the 
amount  of  trial  end  error  small,  and  .the' -condition  about" 
ideal  systems  is  necessary,  since  these  automatically  give 
consistency  checks-    The  exist enoe~ of . probable  words  and  v."*;-.-. 
phrrses  is  implied,  by  the  condition  .of  natural  language  a*  .  * 
Conversely,  it  seems  reasonable  that  if  the  key  is  difficult*    ?  ' 
to  obtain,  knowing  a  text :ahd  Its  cryptogram,  then  the 
system  should  be  strong.         •  .*"■■'  ' 
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Note  that  this  requirement  by  itself  is  not  con- 
tradictory to  the  requirements  that  enciphering  and  decipher- 
ing be  simple  processes.    Using  functional  notation  we  have 
for  enciphering 


and  for  deciphering 


E  =  f  (K,  I) 
M  -  g  (K,  E). 


Both  of  these  may  be  simple  operations  on  their  arguments 
without  the  third  equation 

.   -      K  »  h  (M,  E) •      -     -  ■  -  ' 

•  .     jg  -.     ■      '     ,    .  .- 

being  simple*  \.       ^        v''""  ;- 

^         •      -  .  .3        '  :"      ::  ''5v 

V'e  may  also  point  out  In  investigating  a  new  type 
of  ciphering  system  one  of  the  best  methods^off attack  is  to 
consider  hove  the  key  could' be  determined' if  a  sufficient 
mount  of'M  and  E  were  given.  - 

With  a  small  key,  the  work  required  to  solve  a 
system,  given  a  lerge  emount  of  dr.ta,  may  be  expected  to  be 
not  more  thrn  a  few  orders  of  magnitude  greater  thpn  the 
work  required  to  obtain  the  key  from  a  small  amount  of  datr 
when  both  U  end  E  nrc  known. 

The  same  principle  of  confusion  era  be  (nnd  must  be 
used  here  to  crer-te  difficulties  for  the  cryptanrlyst. 
Given  K-rn^mg  ...  mg  end  E  -  e,  eg         eQ  the  crypt  rn^lyst 

enn  set  up  equations  for  the  different  key  elements  k^  kg 

(nrmely  the  encipherings  equations)*  V;  " 


fg  (n^,  m2#  •♦♦,m8J  l£i#».*#kr>^ 
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All  is  known,  we  assume,  except  the  k,.    Erch  of  thr  s  j  equa- 
tions should  therefore  be  complex  in      the  k.,  and  involve 
ninny  of  then.     Otherwise  the  enemy  en  solve    tho  sicple  om 
and  then  the  more  complex  ones  by  substitution. 

From  the  point  of  view  of  increasing  confusion,  it 
is  desirr-ble  to  hive  the-  f^  involve  several  n^.t  especially 

if  these  sre  not  adjacent  and  hence  less  correlated.  This 
introduces  the  undesirable  feature  of  error  propagation., 
however,  for  then  erch  e,  will  generPlly  affect  several  m, 
in  deciphering,  and  an  error  will  spread  to  rll  these.. 

We  conclude  thet  much  of  the  key  should  be  used  Ir. 
an  involved  manner  in  obtaining  any  cryptogram  letter  from 
the  message  to  keep  the  work  characteristic  high*    Further  r 
dependence  on  several  uncorrected  m.  4-s  desirable,,  if  some 
propagation  of  error  can  be  , tolerated*    V/e  are  led  by  all 
three  of  the  rrguments  of  these  sections  to  consider  "mixing 
transformations,."  , 

37*    Mixing  Trensf ormo tions 

A  notion  that  hr-s  proven  v^lu^ble  in  certain  branc 
of  probability  theory  is  the  concept  of  a  "mixing  transforms 
tion."  Suppose  we  have  a  probability  or  measure  space  0,  ar. 
measure  preserving  transformation  T  of  the  space  into  itself 
i.e.,  a  transformation  such  that  the  measure  of  a  transform* 
region  TR  is  equal  to  the  measure  of  the„initial  region  R. 
The  transformation  is  called  mixing  if  for  any  function  de- 
fined over  the  space  ,  end  any  region  R. 

n^o,    J  'til)  dP  -  J  dP  J  f  (P)  dP. 
T°R  R       O  ' 

This  means  that  any  initial  region  of  the  space  R  under  suc- 
cessive applications  of  T  is  mixed  into  the  entire,  space  & 
With  uniform  density*    In  general  S^R  becomes,  a  region  con- 
sisting of  a  large  number  of  thin i  filaments  spread  through- 
out the  region..'  As  n  increases  the  filaments  become  finer 
and  their  density  more  nearly  constant* v       •  v 

An  example  of  a  mixing  transf  ormation  is  shown  in 
Fig.  21.    Here  measure  is  identified  with  Euclidean  area.  ' 
The  spaoe  is  the  'triengle and  tNp  is  the  print  \  units ■  «f 
distance  ab^ve  point  P  providing  this  does  n*>t  g^  outside 
the  triangle*    When  the  top  of  the  triangle  is  renched  a 
point  is  transferred  first  to  the  point  directly  beneath, 
and  then  over  to  the  right  en  irrational  fraction  of  the 
base  width.     If  this  carries  the  point  beyond  the  right  edge 
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the  extra  distance  is  mersured  from  the  left  edge.  -Successive 
transforms  of  b  square  region  ere  shown  in  Fig.  21.    For  \ 
ve,ry  lrrge  the  squar-.  is  turned  into  q  uniform  grating  ot 
nearly  parallel  thin  strips  covering  the  triangle. 

A  mixing  transformation  in  this  precise  sense  en 
occur  only  in  a  spaee  with  on  infinite  number  of  points,  for 
in  a  finite  point  space  the  transf ormation  must  be  periodic. 
Speaking  loosely,  however,  we  can  think  of  a  mixing  trans- 
formation as  one  which  distributes  ?ny  reasonably  cohesive 
region  in  the  space  fairly  uniformly  over  the  entire  space. 
If  the  first  region  could  be  described  in  simple  terms,  the 
second  would  require  very  complex  ones*    In  the  case  of 
y~  cryptographic  interest,  the  original  region  is  all  of  a  cer- 

•.;  tain  simple  statistical  structure  —  after  the  mix  the  region 

.<  '  .is  distributed  and  the  structure  diffused  and  confused* 

.    Go~d  mixing  transformations  are  often  formed  by  re- 
k.     &  "     peated  products  of  two  simple  non-commutating  operations*. 
.  ' See  for  example  the  mixing  of  pastry  dough  discussed  by  Hopf.* 
The  dgugh  is  first  rolled  out  into  a  thin  slab,,  then  folded 
over,-  then'  rolled,  and  then  folded  again,  etc 

In  a  good  mixing  transformation  of  a  space  with 
natural  coordinates  X,,  X2,.  .  *.  .,  Xg  the  point  X.  is  carried 
by  the  transformation    into  a  point  Xi,  with 

Xj^  ■*■  f  ^  (X^ ,  Xg ,  • » »  , ,  Xg )  i  "  1 ,  2 ,  *  •  •  ,S 

and  the  function*  f,  are  complicated,  involving  all  the 
variables  in  a  •"sensitive"  way.    A  small  variation  of  any  one, 
X3,  say,  changes  all  the  XI  considerably.    If  X„  passes  throug 
its  range  of  possible  variation  the  point  XI  traces  a  long 
winding  path  around  the  space. 

... 

Various  methods  of  mixing  applicable  to  statistical 
sequences  of  the  type  found  in  natural  languages  can  be 
-devised.    One  whioh  lo  ;ks  fairly  good  is  to  follow  a  prelim- 
inary  transposition  by  a  sequence  of  alternating  substitutions 
. '.  '  J  end  simple  linear  operations,  adding  adjaoen^  letters  mod  26 

*  for.  example  *  •    r  ■.  ..;  > 

Thus  .  >.-.  '. 

S*Jht      r-'i-  •  •    .  •  •  ■  *'  .  .  . -f  i  SJ  rv-.  -  •  ' 

H  -  L3ISLT  ■  ;  . 

"where  T  is  a  transposition,  X  .is  a  linear  operation*  and  S  is 
" '  -  a  substitution. 

•  ..  . 

*E.  Hopf,  On  Causr-lity,.  Statistics  and  Probability,  Journol  ol 
.    /      Mrth*  and  Physios,  V.13,  pp. 51-102,  1934. 
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38.     Ciphers  of  the  Type  1\HS. 
 1   1 

Suppose  that  H  is  r  good  mixing  transformation  * 
can  be  applied  to  sequences  of  letters  and  thst  T.   find  S. 
any  two  simple  families  of  t ran s formations ,  i.e.,    two  J 
ciphers 4  which  may  be  the  same..  For  concreteness  we  m^y  1 
of  them  as  both  simple  substitutions.. 

It  appears  that  the  cipher  THS.will  be  r  very  g: 
ciphering  system  from  the  standpoint-  of  its  work  chnrnctei 
In  the  first  place  it  is  clcr  on  reviewing  our  arguments 
statistical  methods  that  no  simple  statistics  will  give  ir 
tion  about  the  key  -  any  significant . statistics  derived  fr 
must  be  of  e  highly  involved  end  very  sensitive  type  -  the 
dundpncy  has  been  both  diffused  and-  confused  by  the  mixing 
.  .  Also  probable  words  led  to  e  complex  system  of  equations 

Ing  all  parts  of  the  key  {when  the  mix  is -good),    which  mu 
.solved  simultaneously,.  The  bad  features  of  such  a  system 
v  v       ••  - :*     propagation  of  errors  and  complexity  of  operations,  both  c 
/  •    V:         which  get  worse  ns  the  mixing  of  H  gets  better. 

It  is  interesting  to  note  that  if  the  cipher  T  i 
omitted  the  rempining  system  is  similar  to  S  nn1  thus  no 
stronger.    The  enemy  merely  "unmixes"  the  cryptogram  by 
,  plication  of  H~l  and  then  solves..    If  S  is  omitted  the  re- 
maining system  is  much  stronger  th*n  T  alone  if  the  mix  is 
but  still  not  comparable  to  THS. 

The  bnslc  principle  here  of  simple  ciphers  sepa 
by  a  mixing  transformation  can  of  course  be  extended.  For 
example  one  could  use 

'S,  '  TkHiSjH2Rl 

«$& .  .       *     -  -,        •  .  ' .  >•*.»'«•• 

••    >«-       '  JIth  two  mlxes  and  three  simple  ciphers.,    One  can  also  sim 
by  using  the  same  ciphers,  and  even  the  same  keys  (inner 
product)  ns  well  as  the  same  fixing  transformations*  -  This 
•  ;*jr..        might  well  simplify  the  mechanization  of  such  systems^  " 

••/,  ■      The  mixing  transformation  which  separates  the  t\ 

>  -N  {or  more)  appearances  of  the  key  acts  as  a  kind  of .  barrier 

/>.    ti;; J**  enemy  —  it  is  easy  to  oarry  a*  known  element  over  this 
barrier  but  an  unknown  (the  key) does  not  go  easily, 

«...  ....   ,  By  supplying  two  sets  of -unknowns,  the  key  for  £ 

the  key  for  T,  and  separating  them  by  the  mixing  transform' 
H  we  have  "tangled"  the  unknowns  together  in  r  way  thrt  m«V 
solution  very  difficult, 


Although  systems  constructed  on  this  principle 

wpuld  be  extremely  safe  they  possess  one  grave  disadvantage. 
If  the  mix  is  good  then  the  propagation  of  errors  is  b^d. 

A  transmission  error  of  one  letter  v.ill  affect  several  let- 
ters on  deciphering* 

. 

39.    The  C omi.o und  V  ige neVe 

In  the  compound  Vigenere  severcl  keys  of  length  d. 
<3gf  ..*  f  dg  are  written  under  the  message  and  added  to  it 

modulo  26  to  obtain  the  cryptogram,    The 'result  is  8  Vigenere 
with  key  of  special  type,'  -whose  repetition  is  of  period  d „  the 
least  oommon  multiple  of  cU,  <5„,         dg.    If  we  h'-'ve  three 
keys  of  periods  £,  3,  5  thl  total  period      is  50  nod  the  total 
key  size  (2+3+5)  x  1,41  -  14,1  digits.    The  situation  is  then 

M  '  al  ^  ^  m4  m5  m6  - 


* 


H  ~\  a2  al  aE  al  kZ 
K2  -  bx  b2  b3  bx  b2  b3 

K3  -  Cl  C2  C3  C4  C5  Cl 


E    *"  el  e2  e3  e4  e5  e6 

ith  . 

el  *  ^1  4  al  +  bl  +  cl 

e2  "  ml  *  a2  4  bl  4  c2 
etc« 

If  we  assume  M  nnd  E  known  then,  letting       »=       r  m( 
s  V  a.  +  b,.  0,-h,  a,  +  b3  ♦  c,  -  h5 

'  '     "  '   '  ■       +  *2  *  °2  "  h2           Ql  4  bl  4  °2  •  V  . 

Rl  *  b3  *  c3  "  h3  '  R2  *  c3  ,r  W 

.        .     .           Q2  *  bl  4  °4  "  *4           al  +  b3  4  C4  "  b9 

Ql  +  b2  +  C5  *  h5           C2  +  bl  +  C5  "  h10 


These  equations  are  easily  solved  for  the  key,  although  not  as 
easily  as  in  the  simple  Vigenero  or  othor  sinple  ciphers.  As 
the  number  of  constituent  periods  increases  the  solution  be- 
comes more  involved  and  time  consuming.    In  any  case  wo  have 
a  system  of  simultaneous  equations  each  involving  S  of  the 
s 

total  of  B^dj^  unknowns.    The  unicity  point  will  occur  at  abou 

2B  letters  and  if  soveral  tines  this  amount  of  material  is  in- 
tercepted no  groat  difficulty,  should  be  encountered  in  breakin 
the  cipher,  providing  S  is  not  mora  than  say  6"  or  8.    With  the 
first  9  primes  as  periods  we  have  a  key  size  of  100  letters  or 
about  141  digits,  the  unicity  distance  is  about  200  letters  an 
the  key  does  not  repeat  for  223,092,870  letters.    This  systen, 
although  much  better  than  such  methods  as  simple  substitution, 
transposition  and  simple  Vigenero  with  equivalent  key  size,' 
does  not  utilize  the  available  key  fully  in  making  the  cryptV 
analyst  work  for  the  solution.    The  equations  only  involve  3 
of  the  B  key  unknowns  and  those  in  a  simple  fashion*  The 
equations  easily  oombine  and  reduce  to  eliminate  unknowns.  If 
a  large  amount  of  material  is  available,  compared  to  the  unicii 
distance,  particular  sets  of  equations  can  be  combined  to 
eliminate  unknowns  very  easily.    The  system  possesses  the  inpo: 
advantage,  however,  of  not  expanding  errors.    One  incorrect 
letter  of  cryptogram  produces  one  incorrect  letter  of  decipher*, 
text. 

.. 

By  relatively  simple  changes  this  system  could  be 
strengthened  considerably.    If  tho  equations  for  the  key 
elements  (with  M  and  E  known)  could  be  made  into  higher  degree 
equations  rather  than  linear  ones  the  difficulty  of  solution 
would  increase  tremendously.    This  could  easily  be  done  in 
a  mechanical  device  by  successive  multiplications  (Mod  26) 
of  tho  key  letters  according  to  some  prearranged  schome, 

* 

40  »    Incompatablllty  of  the  Criteria  for  Good  Systems 

Tho  five  criteria  for  good  socrccy  systems  given  in 
seot ion  12  appear  to  havo  a  certain  inconpatability  when  ap-  - 
plied  to  a  natural  language  with  its  complicated  statistical 
structure.    With  artificial  languages  having  a  simple  statis- 
tical structure  it  is 'possible  to  satisfy  all  requirements 
♦simultaneously,  by  means  of  the  ideal  type  ciphers.    In  natural 
languages  It  seems  that  a  compromise  must  bo  made  and  tho 
valuations  balanced  against  one  another  with  a  view  toward 
the  particular  application. 


If  any  one  of  the  five  criteria  is  '"roppec* ,  the 
other  four  crn  be  s?itisfied  fr.irly  well,  r.s  the  following 
examples  show. 

1.     If  we  omit  the  first  requirement  (amount  of  secrec 
any  simple  cipher  such  os.  simple  substitution  will 
In  the  extreme  case  of  omitting  this  condition  com- 
pletely, no  cipher  at  fll  is  required  end  one  send. 
.    the  clef.ri 


2.  If  the  size  of  the  key  is  not  limited  the  Vernam 
system  can  be  used. 

3.  If  complexity  of  operation  is  not  limited.,  various 
'•extremely  complicated  types  of  enciphering  process 

cen  be  used*  The  modified  compound  Vigenere  descr 
above  with. many  different  periods  compounded  is  f e : 
satisfactory  as  an  example  here,  although  it  falls 
down  somewhat  on  the  key  size  condition.  Ideal  syf 
"and  enciphered  codes  are  also  frir  examples  althout 
not  too  good  from  the  propagation  of  error  point  o: 
view. 

4i    If  we  omit  the  propagation  of  error  condition  syst 
-  of  the  type  THS  would  be  very  good,  although  sonew: 
complice tad. 

5.  If,  we  allow  lr.rge  expansion  of  message,  vr.rious  sy.- 
are  easily  devised  where  the  "correct"  message  is  : 
with  many  "incorrect"  ones  (misinf ormrtlon) .  The  \ 
determines  which  of  these  is  correct. 

•  A  rough  argument  for  the    incompatibility  of  the.  : 

conditions  may  be  given  as  follows. 

>  '  ' 

■  '  '* :        From  condition  5,  secrecy  systems  essentially  a  s 
Studied  In  this  paper  must  be  used;  i.e.,  no  great  use  of  r. 
etci    Perfect  and  ideal  systems  are  excluded  by  condition  c 
rg^0&aMJHr  3  and  4,  respectively.    The  high  secrecy  required-  bj 
>'^;"^^^flWi«'*th«n*TD<3tf» -£rm  a  high  work  characteristic,  not  from  a 
^  high  equivocation. characteristic  ,  If  the  key  is  small,  the 
>  '_'  ^..^f^-r^: system'  simple,  and  the  errors  do  not  propagate^  probable  wc 
methods  w  11}.  generally  solve  the  system  fairly  easily,  sine 
we  then  have  a' fairly  simple  .-system  of  equations  for  the  ke 

This"  reasoning  is  too  vague  to  be  conclusive,  but 
general  idea  seems  quite  reasonable.  Perhaps  if  the  varioi. 
criteria  could  be  given  quantitative  significance,  some  sot 
an  exchange  equation  could  be  found  involving  them  and  giv: 
the  best  physically  compatible  sets  of  values.  The  two  mo: 
-  t  difficult  to  measure  numerically  are  the  complexity  of  opei 
tions,  end  the  complexity  of  statistical  structure  of  the 
•  language  .  , 


■ 


Appendix  1 


Deduction  of  -  I  pj  log  pi 

It  will  be  shown  that  the  meusure  of  choice  - 
£  Pi. log  Pi  is  a  logical  consequence  of  three  quite  reasone 
assumptions  about  the  desired  properties  of  such  a  measure. 
The  three  assumptions  are: 

V    (1)    There  exists  a  function  C(plt  p2,  pn) 
uous  in  the  p^,  measuring  the  amount  of  "choice"  when  there 
n  possibilities  with  probabilities  p^ , 

/•-.  '  • ..  '  .  '  • 

.  <2)  ,  C  has  the  property  that  If  a  given  choice  be 
broken  aown  into  two  successive  choices  the.  total  amount  of 
choice,  is  the  weighted  sum  of  the  individual  choices*  .  For 
example,  suppose  the  choice  is  from  4  possibilities  A,  B,  C 
with  probabilities  Yl,  .2,         «4U  .  .This  can  be  broken  down 
a  preliminary  choice  hetween.the  pair  A,  B  and  the  pair  C, 
Pair  A,  B  has  a  total  probability  .1  +  .2  «  .3  and  pair  c, 
probability  .3  +  .4  «  .7.    If  pair  A,  B  is  chosen  a  second 
between  A  and  B  must  be  made  with  probabilities     -*1        «  1 

.1  +  .2  Z 

42  2 

V  "         If  Pair  c»  D  is  chosen  a  second  choice  betwee 

•*  * 

and  D  must  be  made  with  probabilities  ^    and  *      ,  Thus  brok 
down  we  have  a  preliminary  amount  of  choice  C  (.3,  ,7)  end 
of  the  time  a  secondary  choice  of  c  (±  f  2  j  while  .7  of  th 

time  the  secondary  choice  is  C  (2  .  Our  condition  req 

that  the  total  choice  C  (.1,   .2,  -3,  t4)  be  the  same  as  the 
,  weighted  sum  of  the  different  choices  when  decomposed,  weig 
in  accordance  with  the  frequency  of  occurrence.    Thus  we  re 
in  this  case  C  ,2,  .3,  .4)  «  C  (.3,  .7)  +  ,3.C  (-  ,  -  ) 

;f^^!-,         If  .A(n)  ?  c  (I  #.  i,.!*.*.  .»  the  choice 

when  there  are  n  equally  likely  possibilities,  then  A  (n)  i; 
monotdnio  Increasing  in  n.     i  . 

Theoreaj   .  Under  these  three  assumptions 

(•■••»       -    -      •  _ 

C  (PI,  P2,  ,  Pn).88  -  K£  Pi  log  pi  . 

where  K  is  a  positive  constant. 


-  106  - 


From  condition  (2)  we  can  decompose  a  choice  from  equall; 
likely  possibilities  into  a  series  of  m  choices  each  from  s 
equally  likely  possibilities  and  obtain 

A  (S111)  ■  m  A(s) 

Similarly 

;.  (tn)  -  n  A(t) 

We  can  choose  n  arbitrarily  large  and  find  an  m  to  satisfy 

S*<  t*<  S01  ■*  1 
Thus,  taking  logarithms  and  dividing  by  n  log  S, 

5    £  <  log  t V  _m    +  ± 


'"log  s- .  ,  «         j  st       lSTs.|-<  e 
where*  is  arbitrarily  small* 
Now  from  the  monotonic  property  of  A(n) 
A(SP)  <    A(tn)  <    AO*  +  1) 


m  a(s)  <    nA(t)  <  (m  +  1)  A(S) 
Hence,  dividing  by  nA(S), 


m  s  t )  m  1 
n  —  MS)   —  n  b 


•  -  m  \k" 


-  I  <  2  e      A{t)  •  -K  log  t 

"{BY     log  S     I  *~ 

where  K  must  be  positive  to  setisfy  (3), 

Now  suppose  we  have  a  choice  from  n  possibilities  with  comme 
surable  probabilities  p^  *  where  the       are  integers* 

can  break  down  a  choice  f rom  £n4  possibilities  into  a  choice 
f roa possibilities  Tvith  probabilities  pi*  »>pn  and  then,,  if 
the  ith  was  chosen,,  a  choice  from  ni  with  equal  probabilitie 
Using  condition  2  again,  wef  equate  the  total  choice  from  £ni 
as  computed  by  two  methods 

K  log  Eni  -  c  (pi-,         ,  Pn)  +  K£  Pi  log  nj_ 
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Hence 

C  -  K  [E  pi  log  I  ni  "  E  pi  log  ni] 
■  *  K  2  pi  log  -SL  «  -K  £  Pi  log  pi 

If  the  pi  are  incommeasureble,  they-may  be  approximated  by 
rationale  and  the  same  expression  must  hold  by  our  continuity, 


mce  and  amounts  to  the 

choice  of  a  unit  of  meesure, 


m 

/in 


i 
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Appendix  2 


proof  of  Theorem  4 

Select  any  message  Mi  and  group  together  all  crypto- 
grams that  can  be  obtained  from  Mi  by  an  enciphering  operation 
Ti#    Let  this  class  of  cryptograms  be  c{.    Group  with  Mi  all 
Mg  that  can  be  obtained  from  Mi  by  Tj^TjMlf  and  call  this  class 
Ox*    The  same  ci  would"  be  obtained  if  we  started  with  any  other 
M  in  Ci  since     :  ";.\.  •' 

•  -  - :  ;    ■  I  i  .      if, &  TsTj^ki %  :  %iUmm..  ' .  ■ 
.2.,: ,       ;  •  .  •;. ^^aj^;1^-" 

Similarly  the  same  Ci  would  be  obtained; :>r  > 


-  * 


Choosing  &n  M*.flf  any  exist)  not  , in  Ci.we  construct  i- 
G2  and  Ce  in  the  same  way*  .'Thus  ^We  obtain  the  residue*  classy 
with  properties  (1)  and  (2).    Let  Mi  and  M2  be  in  Ci  and  suppose 

M2  -  T2  Ti-1  Mi 

■ 

If  El  is  in  Ci  and  oen  be  obtained  from  Mi  by 

Ei  -  \  Ux  -Tp  Mx  -  Mlr 

then 

El  *  ^  T2  Tl  M2  "  Tp  T2X  Tl  M2  "  ♦ m  ' 

»* 

"  ^  M2  -  ^  «2 


Thus  each  Mi  in  Ci  transforms  into  Ei  by  the  same  number  of  keys. 
Similarly  each  Ei  in  c{  is  obtained  from  any  M  in  Ci  by  the  same 
number  of  keys.    It  follows  that  this .number  of  keys  is  a  divisor 
k  '  ,  .  of  the  total  number  of  key*  and  hence  we  have  properties'  (3)  and  .  .. 


..     *  ^-  o<  * 


.  .  -  ••••    •  I... 


...  ,*  S6*r*  .      4.:?  * 


"  ;  1*  •. 


.    i  '      .—  .4  „• 
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x  3 


Equivocation  of  Message  for  Random  Cipher 

As  before  let  Mi  ...  Ms  be  high  probability  mes 
and  Ms+l  ••«»  Mu  have  zero  probability.    Let  P(mi,  m)  be 
probability  of  just  mi  lines  going  from  a  particular  E,  s 
to  a  particular  high  probability  M,  say  Mi,  with  a  total 
lines  to  all  high  probability  M.  Then 


... 

.-..!-■  ft 


_,„  (k)  (m)    (i)»l  (s;i)"i-i»1(1.s) 

The  probability  of  intercepting  an  E  with  m  lines  t 
bility  M's  la:^  > 


k-n 


'  ■  - 

The  Q(M)  expected  can  be  thought  of  as  contributed  to  by 
various  Mi  .in  the  high  probability  group.    Thus  Ml  contri 

.     mi       mi  ,  m 
-       log  — =  ■  —i  log  — 
m   xue  m        m      6  mi 

if  there  are  mi  lines  to  Mi  and  a  total  of  m  to  high  pro^ 
M's.    The  expected  Q  is  then 

(MM)  -  a  S  miEm  PCj.m)  §j   SL  log  S_ 

The  factor  H  sums  over  the  various  Ei  and  the  S  sums  ovei 
different  Ml,(i,     l>t         s)  •  Hence, 


Q(M)  -  I  £  P(mi,m)  mi  [  log  m  *  log  mj 


the  term  y 

i    -  v.-  ■  ,.  ■ 


V 


E  P  (mi,m)  mx 


summed  on  mi*  gives  the  expected  mi,  when  m  lines^go  to  h 

probability.  Mgt  1*©,,  m/a,    Henoath'e  first  term  is 

•  •*  * •»:.-> fx*. ■*'■'; 

JL   £  m  P  (m)  log  m  *  Q(K) 
m 

by  our  previous  work.    The  second  term  is 

•  JSP  (mj.,  m)  mi  log  mi 


If  the  expected  mi  is  «1  this  term  is  small  since  it  vanishes 
for  mi  ■  0  or  1.    The  expected  mi  is  k/H»    Thus  beyond  this 
point  Q,(M)  approaches  closely  to  Q,(K)  •    The  point  in  question 
is  where     JK|  •  |Mpf  -  RqN  • 


or 

IK 


If  the  expected  »1  the  log  mi  can  be  taken  out  as  log  Hi  «* 
log  k/Hi  and  we  have'  ,  -  : 


log  =y    £  P>j 


'       '    ^  -log  §  -   }Mo1  r  .|K!:^-r  • 

In'  this  "region  then    •  -   V "  '.  '  ;  "y 

Q(1C)  •  |M0|    -  id    +  d(K) 


but  here    Q(K)  -  ]k|    -    |M0|    +  : •  Jill,  and  therefore 
q(M)  -  |m[  -  RN        .  -  ' 

In  the  transition  region  Ei  is  about  1  and  Iff  will  in 
ordinary  cases  be  very  large.    It  is  admissable  then  to  replace 
?(mi;  m)  by  P(mi)  ,  since  this  will  not  depend  on  m  to  any  extent 
except  for  values  of  m  of  very  small  probability.    Thus  we  obtain 
for  this  region 

iiU)  -  -  3  £  p(mi)  mi  log 

The  "sum  has  the  same  "form  as  our  expression  for  Q{K)  but  with 
l/H  In  place  of  s/H»    The  calculations  for  Q(K)  can  be  used, 
therefore %  with  only  a  change  of  '<  the^U  scale  byja  factor  of 

.  '•'  '  '"•  ^>-"~"  ^"'ft  *"  •'  '  i. '  J}'*' 


-  Ill  - 
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Appendix  4 


Key  Appearance  in  Simple  substitution  with  Independent  Le- 

If  successive  letters  are  chosen  independently  e 
the  different ' letters  have  probabilities  Pi  P2         Ps»  we 
calculate  the  expected  number  of  different  letters  when  N 
letters  have  been  intercepted.  ;  It  is,. 

:,^,L,       ,i  IW  -  s  -  e  (l  -  Pi)N  ; 


t 


To  prove  thi*« * iiaklte«iri^'*^Klbl«  sequences  of  N  le 
written  down,  each  wifch'^a  frequency  corresponding  to  its  ] 
bility,  giving  a  total  ^of  aay  A  sequences*..  Letter  1  does 
appear  in  (1  *  Pi)N  A  of  thesej  letter  E  does  not  appear  i 
(1  -  P2)N  A  etc.    Therefore/  "the  total  number  of  letters  r 
from  sequences  is 

AMI"  Pi)N 

Dividing  by  A  gives  us  by  definition  the  expected  number  t 
missing  letters  from  a  random  sequence,  E(l  -  p«)N,  rphe  j 
of  different  letters  expected  in  a  sequence  is  the  total  : 
of  letters  S  minus  this,  giving  the  desired  result. 

If  all  the  pj.  are  equal  this  reduces  to  S  -  S(l 
ah  exponential  approach  to  S«    In  the  general  case  there  i 
series  of  exponentials  with  different  time  constants,  cor: 
sponding  to  different  p^,  which  are  added  to  give  «L(N). 

With  the  frequencies  of  normal  English  used  for 
p^t  we' obtain  the  curve  shown  in  Fig*  25,  along  with  ah  e: 
mental  ourve.    The  small  discrepancy  can  be  attributed  to 
influences  of  nearby  letters*  (IaJBnglish- there  is  less  tc 
-to  double  letters  than  there  would  be  if  the  letters  were 
pendent  but" with'  the  same  probabilities.    For  English  the 
.bility  of  a  doubled  diagram  is , ^ 


i*K.'«Mu  •  .  ••'    •-  •       ■  -k.  J:  ..  *         h'S    ,  " 

r^y      'i'^i*^^-  *->..      \v.  £  P(i*  i)    "  • 0315 

.  *   while  if  letters  were  independent  it  would  be  v 

.-.  ^  -     »  -,:■■■:*■;{  p    ■  ;     ■  -  *  *.  •  •>  • ' -  -•  U. 

E  pj  *  ,0670. 
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A  Theoretical  Case  Where  All  Invariant  Statistics  of  E  Are 
Independent  of  K. 

By  an  invariant  statistic  of  e  sequence  of  letters 
S  »',»..,  m_2        niQ  m^  m2  •     m3  , we  will  mean  r  statistic 
which  is  averaged  along  the  length  of  the  sequence  E»  More 
precisely  a  statistic  of  the  form:, 

Lim  i —  (F(E_b)*-»-  ♦+  F(E„i)+r{E)  ♦  F(Et)  +  F  (E2J+...+  F(En) 

n  -co  (2n+l)  (  ^  — 

....  ,  .  ■   ' .     4   *   ".'       ■  ■        ...  .      •  ■  -Vi?,  : 

'  '■■  .' . ,  *  ,  ...        "  '  •    ,.        .  "    .  -        _  •• 

where  F  is  any  function  whose  argument  Is  a  possible  sequence  ,  and 
E±a  is  the  sequence  E  shifted  N  letters  to  the  right  -or  loft. 
Such  statistics  as  the  relative  frequency  of  a  given  letter,  of, 
a  given  n-gram,  transition  frequencies,  and  frequencies  with 
whioh  letter  i  is  followed  by  letter  i  at  e  distance  n  are  all 
invariant. 

•  ••  • 

We  will  describe  a  system  in  which  every  invariant 
statistic  which  the  cryptanelyst  can  construct  from  the  (infinte) 
intercepted  E  is  independent  of  both  K  and  M,  and  thus  gives  no 
information  to  him.    This  effect  and  still  more  occurs  with  the 
ideal  ciphers  of  course,  but  here  it  is  obtained  independently  of 
the  original  message  statistics  and  without  any  matching  of  the 
cipher  to  the  language. 

Let  N  be  a  "random"  sequence  of  letters; 

N  *  »•»  n_2  n-i  n0  n^  n2        us  ... 

this  is  supposedly  a  known  sequenoe  (to  the  enemy)  and  thus  a 
part  of  the  system,  not  of  the  key.    Apply  eny  simple  cipher  to 
the  message  and  then  add  N  letter  by  letter  to  the  result  {mod 
B6)«    The  ♦•sum'*  is  the  enciphered  message*    'it  is  evident  that 
any  Invariant  statistic  oa  S  will  be  (with  probability  1) -the 
same.es  that  for  a  rendom  sequence*    Hence  it  is  Independent 
of  both  K  and  M»        ;  x  • 

We  need  hardly  add  that  such  a  system  is  easily 
broken  ~the  enemy  merely  subtracts  N  from  E  and  then  solves 
the  simple  residual  cipher*  which 'may  often  be  done  with 
invariant  statistics,  > 
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Maximum  Repetition  Rate  in  Compound  Systems  for  a  Given  To- 

We  consider  briefly  the  question  of  how  to  arran- 
component  periods  in  a  compound  Vigene're  or  Transposition  i 
to  obtain  the  longest  period  for  a  given  total  key  size, 
component  periods  are  Px,  P2,/t*»  Sg  JLt  is  clear  that  they 
b'e  co prime.  Otherwise  the  total  key,  which  is  LPif  could  \ 
duoed  without  changing  the  period,  which  is  the  least  comm; 
multiple  of  the  Pi,  merely  by  deleting  a  factor  which  appet 
several  o'f.  the  P^  from  all  but  one/  Also  each  p  must  be  e 

of  a  prime,  for  if  it  contains  two  primes,  it  can  be  divide 
these  parts,  reducing  the  key  and  not  affecting  the  period, 
the  component  periods  are  selections  from  the  series  of  pri 
and  powers  of  prime sj      . . 

4&  2„  3,  4,  5,  7,  8,  9\  )^:XZ4?m:i7'f,  19,  23,.  25,.  27, 

the  seleotion  being  pairwise  ooprimeV 

It  appears  from  empirical  evidence  that  the  best 
of  component  periods, for  a  given  total  size  S  is  found  by  t 
following  process, 

1.  Determine  the  largest  M  such  that  Ipj<S  where  the 
are  the  primes  in  increasing  order^    This  is  the 
maximum  number  of  periods  where  the  periods  are  c 
prime,  end  is  the  number  of  periods  to  be  used. 

2.  Choose  from  the  sequence  A,  M  elements,  consecuti 
except  for  the  fact  that  no  prime  is  represented 
than  once,  the  M  elements  being  as  great  as  possi 
with  aum  <S# 

3.  If  the  aum  is  <s  move  as  many  as  possible  of  the 
elements  in  this  block  up -a  notch  in  the  sequence 

v  still  satisfying  .the  conditions  .on  the  sum  and  co 

'  ■  mality ,  ■  :  i    r  •' 

4.  Repeat  3  to  either  part  of the  original  block  if 

,  ,  *  :."       sible •*•  "This  process  eventually  ends  and  apparent 
gives',  the  proper  decomposition* 

 ■  ;  *-':~>!'": 

r-?.  For  example  with  8  »  50^  the  .sum  of  the  first 
primes  is  41,  of  the  first  7  is  58.  Hence  6  peri 
will  be  used.    We  .have 

•  •  11  +  9,  +  8+  7+  £  +  3w43 

13  +  11  +9  +  8  +  ^7  +  5  *  53 


hence  we  start  with  the  block  11,  9,  8.  7    5  3 
to6givl  *  elemants  11»  9»  8'  7.can  be  up  a 

13+  11  +9+8+5+3-49 

Nj  further  improvement  seems  possible,    we  obtain 

F-  13X  11  x  9  x  8x  8  x  3  *  154, 440 
The  products  and  sums  of  the  first  n  prime's  are  given  below 

n  1    £  3  4  5  ...      6  7  8 

pn        ,  2    3  5  7  11  13  17  19 

Sum  2  ■  5  10  17  28  "  ,  41  *  58  77 

Product  2    6  30  210  2310  30030  510510  9699590'  22309! 
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Introduction. 

la  elasaioel  ae&aanios  one  considers  situations 
where  the  state  of  a  syatoa  is  described  bj  i  Mt  of  numbers, 
tie  coordinated  of  the  phaae  space  of  the  system,  and  the 
dynamical  behavior  la  controlled  by  a  eat  of  ordinary  differ- 
antlal  equations.    Suca  a  ays tea  is  entirely  determinate;  the 
future  ia  completely  apeolfiad  by  toe  preaent  state  aad  the 
dynamical  equations,  alnoe  these  differential  equations  have, 
ia  general,  a  unique  eolation  peas  lag  through  a  gives  point. 

In  other  branches  of  physics  (host  flow,  brown! an 
motion,  diffusion  etc)  there  are  situations  which  saa  ha  called 
completely  statistical*    The  path  of  a  particle  of  gas  la 
described  only  statistically  aad  no/  determinate  or  mesa  behsrior 
ocoars.    In  this  case  oae  studies  the  flow  of  probability  which 
ia  described  by  a  partial  differential  equation  of  the  heat 
flow  typo. 

the  present  stomoraadnm  J I  sens  sea  a  partial  diff area- 
tlal  equation  ia  which  both  effects  occur— there  is  a  definite 
•mean"  motion  of  a  system  determinate  ia  character,  carrying 
its  rcpresentatlTC  point  through  phase  space  la  the  classical 
manner  with  a  superimposed  statistical  effect  continually  per- 
turbing it  from  this  path. 


•  a  - 

2a  suoa  a  mm  toe  futars  coordinates  of  tbs  aysteas 
•uuot  bo  precisely  predicted;  oaly  «  probability  distributioa 
fuaoUoa  oaa  be  deterained  for  tha  future  tiae  aaose  *alae 
times  tli«  volww  eleaeat  dT  is  tae  probability  tbet  tae  ayatea 
will  m  la  ibt  wolaa*  eleaent   dr   around  tae  poiat  la  question. 
For  a  snort  tlaa  tne  ays  tea  is  substantially  deteralnata ,  tbs 
dlatribatloa  being  concentrated  around  a  point  whleb  morm*  ao- 
aordlau  to  tae  determinate  part  of  tae  equation.    As  tba  statis- 
tical off acta  ooaa  into  play  this  distribution  broadens  oat  aad 
la  general  approaabea  a  Halting  distributioa  anion  ia  indepen- 
dent of  tbe  initial  atato  of  tbs  systeau 

Xa  eoac  rasps ota  taa  situation  ia  stalls*  to  tbet  la 
quantua  aeebaalsa,  wbere  aysteas  are  dsseribad  only  by  probnbili- 
tiea  (or  wore  praaisaiy  by  wm  foaatlons  whose  squared  aaplitudas 
ara  probabilities*.    Tbara  is  tais  difference  howeTcr;  ia  quantum 
mechanics  area  tae  initial  state  aaaaot  be  preoiaely  deseribed 
due  to  tbs  aaeertaiaty  priaeiple.    Coajaeate  ▼eriablea  aaaaot 
both  be  measured  elaultaaeousiy  vita  exactness.    Za  tae  aysteas 
we  consider  Hera  there  are  asaaaed  to  be  no  dlffioulUes  of  this 
aeture— all  ooor  dins  tae  aaa  be  aiaaltaaeoualr  aad  preeiaely 
measured,    tais  eorrespoads  to  tae  differ  ease   la  tae  fundamental 
equation  from  that  of  qusataa  Aeehsaioe~Sebm,edlagoits  equation  is 
for  the  wave  fuaotion  *  ,  walla  tae  equation  considered  bare  deals 
directly  «itfc  tae  probability  density,    mas  the  present  work:  is 
adapted  to  "ifolar"  statistical  situations. 


Ihln  sort  of  analysis  any  *>*  expected  to  apply  to 
many  pr obi eat  where  the  actual  situation  Is  quits  explicated 
but  a  partial  theoretical  aaalysic  is  possible,    this  partial  an- 
alysis Is  used  for  the  determinate  part  of  tbs  c;u»tioa,  and 
the  other  complex  disturbing  effects  treated  statistically, 
each  situstions  may  occur  la  economics,  sociology,  history,  eta. 
as  veil  as  in  many  engineering  and  physios  J.  problems. 

G.  S.  Stlbits  la  a  series  of  meaoraada  bas  considered 
a  similar  problem  la  aonaeotioa  with  the  stability  of  a  periodically 
closed  servo  ays tea.    la  ale  case  the  phase  space  of  the  system 
oonslsted  of  a  sat  of  discrete  points,  and  uie  fundamental 
equation  is  a  difference  equation,    la  the  case  considered  here 
(which  was  suggested  by  Stlbits*  eora)  the  variables  are  continuous 
and  a  differential  equation  is  involved.  S 


Xa  a  Aataraiaate  *ja\*m  aita  aa  a  dlaaaaloaai  paaaa 
OMi,  nacaa  aotioa  la  iMtriM  bar  diffaroatial  asuatioaa,  *•  aa*a 

jgi  •  fYu\  **,  ....  **)      1  *  X#*  a  <D 

vbara  taa  x*  ara  ©oordLoate*  la  taa  paaaa  apaea  *ad    t   ia  tin*. 

If  aa  a  tart  wita  *  probability  diatributioa  of  poiat*  ia  paaoa  apaoa 

....  **,  t) 

giving  taa  probability  daaalty  ia  tsa  differ aatiai  rain**  «lta«at 

about  at1.  ....  a*  at  tiaa  t,   taia  dlatributfcm  cfaaa«f>a  adta  tin*. 

■  * 

lt»  utloa  la  4»»orll>»a  b»  tM  ftrUH  41ff«r«sU«i  •}u»Uoa 


or  ia  taaaor  aotatioa 


/ 

Taia  ia  oTidoat  If  »•  taia*  of  ?  aa  a  fluid  daaaity  uaoaa  Yaloaity 
flald  ia  f4. 

So*  auppoaa  taat  aa  t&*  raaraaeautiva  poiat  of  too 
ayataa  aovaa  about  taa  pftaao  apaaa  it  ia  ooatinaaily  aubjaat  to 
aaOl  dlatorb&aeaa,  walah  ar«  of  a  probability  ty?a«   tlaia  taa 
ayataa  taada  to  folio*  taa  aoluUoa  of  (1)  but  ie  aoatiaaally 
balac  dlaturbad  by  taa  probability  affeota,  walca  amy  bo  taouaat 
of  aa  aoaathlag  liJca  aolaaular  aoUiaioaa  of  taa  aurrouadia*  ama 


m   %  m 


oa  a  aorta*  partlelo.    *o  art  Ui«rtitt4  la  taa  lioltla*  •*»• 
abort  taa  dltturbiat;  tffoota  are  wp  rapid  tout  T*rj  aaall.  If 
we  eeeuao  that  taa  &ata*aeaee  1*  aa»o«taeottt  aaa  Isotx-oplt, 
tfela  eta  bt  rtpreeeate*  ay  as  afldltloaal  tara  la  taa  equation  of 
tao  aeet  flow  typo 

K?*r\ 

Za  tao  aort  gen*?el  oaoo  ear tela  dlreetloa*  007  00  jr  of  erred,  aad 
oortalo  reslona  may  aave  ereattr  partarbatloa  effaote«    taus  taere 
•111  generally  b«  *  esaU  ellpasld  of  probability  about  oaoa  point. 
aa4  o  oorroopoflcioa  poeltlve  aefiaite  ejiadrntio  for* 

defined  erer  toe  paa*e  apeee*   Tbli  form  deeerlbee  tao  Xoeal 
•tetletleal  perturbine  effeets,  for  eeea  point, 
tao  equation  tata  enauaee  tao  form 

Talt  partial  differential  eonetioa  «©wae  tao  flo*  of  probability 
la  tao  panee  tpeee,    Utb  oa  eaeeable  of  eyatene  dlatribated  at 
t  m  0  aoooraUa  to  F0(al) 

tao  attribution  at  a  la  tar  tlao   t^   la  tao  eolation  of  (1)  for 

Tao  equation  (1)  la  llaoar  aad  of  parabulia  typo  (la  t). 
In  taa  x*  it  le  elliptleel,  aiaea  a1^  la  fOaltlra  definite. 


m  %  m 


Tao  total  .robubiUtj  la  tU  jftaao  0j*«*  *«asia  o^staai,  for  if 
vt  lot 


/  (a1*  5^  ♦  *«    •  « 


tfco  latogral  boia*  ow  o  *  xffUi*aUy  Xar*o  oarfaoo,  ud  ^  t&o 
volt  awaalt 

Xf  a1*  to  aosltivo  oafiaito  «o4  oota  a1**  aa* 
ar«  ooatUwotui  la  tao  aaaao  aaaoo  turn  4iatri»«tioa   v  approaaM 
a  ual$*o  Halt  as  t  HMK   ma  Halt  la  alia«r  s«o  owr*a«*ot 


tao  pNfesalUty  JOtaroaUa*  to  Uf laltf  o*  a  «o*iatt«  Uaitiag  4i#- 
tritouoa  r*  alta  . 


CM 

ft*  aay  %• 

f*a  iiaitiaa  alatritottloa  am*t  aatlofjr  tao  olU#tioal 
ofuatloa  ottaiaoa  ay  oottla*  ||  •  0, 


To  nuom  tact  the  aiitrihution  epproaohea  a  Halt  let 
P1  and  ?g    ee  two  different  solution*  of  ID.    Titea  the  dif- 
ference  o,  -  ?A  -  P^   al«o  satiafia*  the  equation  aad  ^  la 
poaltive  la  oaa  region  B  and  negative  la  tae  raaaladar  at  tae 
apace.    Consider  tae  cuani-ity 

U  auat  deer  ease  for 

where  S  la  tae  surface  of  tae  reeioa  B  aad  T  la  tae  outward 
Telooity  of  tale  ear  face.    Since  Q  vanishes  an  the  surface,  tae 
aeooad  tern  la  aero,  aad  tae  first  la 


Toluae  iategrale  of  diYeraaaeea  aad  traaafora  aj  tae 

i 

usual  theorems  lato  surface  integrale 

V 

tae  aeooad  tera  age  la  vanishes  alace  Q  -  0  on  S.  la  tae  first 
term  «A  la  la  tae  direction  of  ^   a©  at  any  point  we  have 


<  0 


Tims  a aj  initial  distribution 


?a  «4  ?j  H  dearaaaia«. 
•BprMMM  t*»  MM  Xiait. 


i 


•  I  I* 


It   «^    is  SeuiUiMOOS,  *ftt       tots  ft  <U»«aatHuiUyt 

PwiH  b#>  o&u  lienors,  sad  tfcs  ▼sotor  SUE  ftl— aa  i  t— tsassj » 

Ths  saouat  of  tiiia  di««oatiault/  Is  £U  «&  fcy 
ft1*        -  ?j)  •  -  If*  -  ?*)  » 

*frtr«  tht  b***sd  «a4  uafcsjrr  »d  l«n«r*  *****  ts>  ti»«  two  tide* 
of  t&«  dltesoiiiuUt/.  Tims 

SMMyiftlsai  Aft  Mm  *»a  i1£m  o#  s*sft  i  1  nana** ****** g>gj - 

Xft  tSM  sUpisst  Oft«  &l»«ASiS*%l  •*»*  wft  fcm 


If  wo  «tort  with  ft  «opiko*  of  prooaoilitr  ioaaUaoa 
at  oao  point,  ta«  I— tllato  aoaowiar  aaa  bo  aaaarlaoa  la  oittjOo 
tor  a*,    aoar  talt  poUt  wa  **r  ohaaao  a1*  aad  f1  to  bo  aoaotaat. 
Do»  to  tao  f1  tao  aolxo  otartt  «crln«  vita  a  ▼•lojUy/*,  9111141 
too  pro»«oUltr  tors  a1*  •pr«*de  it  out.    If  wo  oottt  wUtt«i 
fro*  af  to 

wo  aooo  - 

*  '  „  „.  - "' 

aod  too  •quatioa  boaoaoa 


taio  ia  tha  o^uatioa  far  aoat  flaw  la  aa  aaiootropla  Bodlua. 
Thai  ia  ftao  y*  aooraiooto  too  «»i*o  dlffaooa  out  lata  a  mwu&m 
al»%rlb*tlaa  *ita  qoaArotU  form  a**|  for  th«  firot  afcort  iatorroi 
of  tiaa 


waoro  A. «  it  tao  laroroa  fora  of  a1* 


feliaa  Toioauy  rial*  gaj  aom^aaaaoaa  at*u«ti«ai  .wta. 

Om  portioalo?  mm  of  la  tor  tot  1*  ttei  la  w&iaa 

is  tUo  opooo.   ?at  a  oao  a&aooslaaal  aaaaa  opoco,tfeo  a$uatlaa  U) 
taaa  aaoaaao  ta«  faxa 

A  coaoxal  solution  far  tola  o*§o  &«s  *soa  foa&u   It  a*?  *o  dosaria  aa 
*a  mxoi>a*   It  wae  laltlol  41*t*iteatl©a  i»  a   s   foactioa,  aa  taa 
sjrataa  (or  0^aeabJL«)  ia  fcaooo  to  aaaa  a  daflalta  talus  at  x  at 
t  *  0,  say  P$   taaa  at  \±  taa  diatribe  Uoa  is  aoraal*   ?ao  saatax 

aM^a  aa^MP  ^^^W^ft^^rd  IsV^^^aa^aV^^Oj^    ^9    s^-$  jjj^L^WW^ 

Taus  taa  attn  £ oaroaaas  alaaa,  taa  ium  suits  aa  taa  aystoa  aaaid 
follow  am  taa  atatiattaal  sff oata  aasaat*   Hm  tarlaaaa  a* 
iaoraaaoa  axyaaaatiaUy  to  a  Ualtia*  taiaa  a/a  aita  aalf  taa  tlaa 

to  ay  ova  taat  taia  la  taa  aalatlaa  it  la  oaly  aaaosaojy 
to  saastitats  la  taa  oqoatiea  (*) ,  k*  t  —a*      too  tiatrisatloa 
approaafcaa  a  normal  aao  saatarad  aa  ««ro  ultn  a*  «  a/a* 


M  •   |U  -  of*) 

«*  »  $  (1  •  O****) 

«iu  oa  oroitrarr  iaitioi  aiotritaUoo  ?aU)  too  oolottoo  ono  bo 
written  *•  ma  mte*r&l  ««lo«  U&*  aotooo  of  lu^iiUm  of  keo* 
flow  9robl«gt»« 

•  /  **m  * 

foe  eeoe  teaerol  rooolto  aoX4  la  toe  I  aiooaeioaal 
I*hi  wh*$i       it  i  ltft»»y  fere  *&d  e^  1a  eooetnat*    A  *OollEO# 
of  probability  eroo&eaa  iAte  o  oorool  Aletrleotleo*  toe  ooefte* 
folio* la*  tfit  dtlsrslMU  trejeetery  oad  toe  qooArotlO  for* 
vfeloh  tekeo  toe  jtliot  of  the  etaoaor4  eOriatloo  toMMNOooi  eat* 
oeoeatioUy  towt  o  eef  Ulte  limit.   *ae  eveloeties  of  too 
e  one  tea  to  io  obob  aero  eoopUeat**  1*  tale  eeoe  oeeew,  ftoe 
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DATA  SMOOTHING  AND  PREDICTION  IN  FIRE-CONTROL  SYSTEMS 


By  R.  R.  Rlackman,  H.  W.  Rode,  and 
C.  E.  Shannon  ■ 

THE  problem  of  data  smoothing  in  fire  con-  distant  airplanes.  Suppose,  for  example,  that 
trol  arises  because  observations  of  target  in  observing  the  target's  position  we  make  two 
positions  are  never  completely  accurate.  If  the  errors  of  opposite  sign  and  a  second  apart,  of 
target  is  located  by  radar,  for  example,  we  may  25  yards  each.  Then  the  apparent  motion  of 
expect  errors  in  range  running  from  perhaps  the  airplane  is  in  error  by  50  yards  per  second. 
10  to  50  yards  in  typical  cases.  Angular  errors    Since  the  time  of  flight  of  an  antiaircraft  shell 


may  vary  from  perhaps  one  to  several  mils, 
corresponding  at  representative  ranges,  to 
yardage  errors  about  equal  to  those  mentioned 
for  range.  Similar  figures  might  be  cited  for 
the  errors  involved  in  optical  tracking  by  vari- 
ous devices.  Evidently  these  errors  in  observa- 
tion will  generate  corresponding  errors  in  the 
final  aiming  orders  delivered  by  the  fire-control 
system. 

A  data-smoothing  device  is  a  means  for  mini- 
mizing the  consequences  of  observational  er- 
rors by,  in  effect,  averaging  the  results  of  ob- 
servations taken  over  a  period  of  time.  The 
simplest  example  of  data  smoothing  is  fur- 
nished by  artillery  fire  at  a  fixed  land  target. 
Here  the  principal  parameter  is  the  range  to 
the  target.  While  individual  determinations  of 
the  range  may  be  somewhat  in  error,  a  reliable 


in  reaching  its  target  may  be  as  high  as  80 
seconds  or  more,  such  an  error  might  produce 
a  miss  of  the  order  of  1  mile.  It  is  clear  that 
in  any  comparable  situation  the  effect  of  ob- 
servational errors  in  determining  the  target 
rate  will  be  much  greater  than  the  position  er- 
ror alone  would  suggest,  and  the  function  of 
the  data-smoothing  network  in  averaging  the 
data  so  that  even  moderately  reliable  rates  can 
be  obtained  as  a  basis  for  prediction  becomes 
a  critically  important  one. 

Aside  from  magnifying  the  consequences  of 
small  errors  in  target  position,  the  motion  of 
the  target  complicates  the  data-smoothing 
problem  in  two  other  respects.  The  first  is  the 
fact  that  it  gives  us  only  a  brief  time  in  which 
to  obtain  suitable  firing  orders.  The  total  en- 
gagement is  likely  to  last  for  only  a  brief  time, 


estimate  can  ordinarily  be  obtained  by  taking    and  in  any  case  it  is  necessary  to  make  use  of 


the  simple  average  of  a  number  of  such  ob 
servations.  This  example,  however,  is  scarcely 
a  representative  one  for  problems  in  data 
smoothing  generally.  The  errors  involved  are 
small  and  the  averaging  process  is  an  elemen- 
tary one.  Moreover,  the  data-smoothing  proc- 
ess is  not  of  very  decisive  importance  in  any 


the  data  before  the  target  has  time  to  do  some- 
thing different.  Thus  the  averaging  process 
cannot  take  too  long.  The  second  complication 
results  from  the  fact  that  the  true  target  posi- 
tion is  an  unknown  function  of  time  rather 
than  a  mere  constant.  Thus  many  more  possi- 
bilities are  open  than  would  be  the  case  with 


case,  since  any  errors  which  may  exist  in  the    fixed  targets,  and  the  problem  of  averaging 


estimated  range  can  normally  be  wiped  out 
merely  by  observing  the  results  of  a  few  trial 
shots. 

More  representative  problems  in  data 
smoothing  arise  when  we  deal  with  a  moving 
target.  In  this  case  errors  in  observational 
data  may  be  much  more  serious,  since  they 
determine  not  only  the  present  position  of  the 
target  but  also  the  rates  used  in  calculating 
how  much  the  target  will  move  during  the  time 
it  takes  the  projectile  to  reach  it.  An  illustra- 
tion is  furnished  by  antiaircraft  fire  against 

•  Bell  Telephone  Laboratories. 


to  remove  the  effects  of  small  errors  is  cor- 
respondingly more  elusive. 

The  intimate  relation  between  data  smooth- 
ing and  target  mobility  explains  why  the  data- 
smoothing  problem  is  relatively  new  in  war- 
fare. The  problem  emerged  as  a  serious  one 
only  recently,  with  the  introduction  of  new  and 
highly  mobile  military  devices.  The  airplane  is, 
of  course,  the  archetype  of  such  mobile  instru- 
ments, and  we  have  already  mentioned  the 
data-smoothing  problem  as  it  appears  in  anti- 
aircraft fire.  Since  the  relative  velocity  of  air- 
plane and  ground  is  the  same  whether  we  sta- 
tion ourselves  on  one  or  the  other,  however,  the 
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mobility  of  the  airplane  produces  essentially 
the  same  sort  of  problem  in  the  design  of  bomb- 
sights  also.  Another  field  exists  in  plane-to- 
plane  gunnery.  Although  they  are  somewhat 
slower,  the  mobility  of  such  vehicles  as  tanks 
and  torpedo  boats  is  still  considerable  enough 
to  create  a  serious  problem  here  also.  Future 
examples  may  be  centered  largely  on  robot 
missiles.  It  is  interesting  to  notice  that  a 
guided  missile  may  present  a  problem  in  data 
smoothing  either  because  it  belongs  to  the 
enemy,  and  is  therefore  something  to  shoot  at, 
or  because  it  belongs  to  us,  and  requires 
smoothing  to  correct  errors  in  the  data  which 
it  uses  for  guidance.  The  tendency  to  higher 
and  higher  speeds  in  all  these  devices  must 
evidently  mean  that  fire  control  generally,  and 
data  smoothing  as  one  aspect  of  fire  control, 
must  become  more  and  more  important,  unless 
war  making  can  be  ended. 

Very  mobile  instruments  of  war,  such  as 
the  airplane,  began  to  make  their  appearance 
in  World  War  I,  but  there  was  insufficient  time 
during  that  war  to  make  much  progress  with 
the  fire-control  problems  which  such  instru- 
mentalities imply.  In  the  interval  between 
World  War  I  and  World  War  II,  however,  a 
considerable  number  of  fire-control  devices, 
such  as  bombsights  and  antiaircraft  compu- 
ters, were  developed.  The  principal  attention 
in  the  design  of  these  devices,  however,  was 
on  the  kinematical  aspects  of  the  situation. 
Although  a  number  of  them  included  fairly 
successful  methods  of  minimizing  the  effects  of 
observational  errors,b  it  seems  fair  to  say  that 
in  the  interval  between  the  two  wars  there 
was  no  general  appreciation  of  the  existence  of 
the  data-smoothing  problem  as  such. 

It  follows  that  the  theory  of  data  smoothing 
advanced  in  this  monograph  is  the  result  prin- 
cipally of  experience  gained  in  World  War  II. 
More  specifically,  it  is  the  product  of  the  ex- 


*  Most  of  these  solutions  depended  upon  the  use  of 
special  types  of  tracking  systems.  Examples  are  found 
in  the  use  of  regenerative  tracking  in  bombsights  and 
antiaircraft  computers  or  in  the  determination  of  rates 
from  a  precessing  gyroscope  or  an  aided  laying  mech- 
anism in  an  antiaircraft  tracking  head.  So  far  as  their 
effect  on  the  data-smoothing  characteristics  of  the 
overall  circuit  is  concerned,  these  devices  are  equiva- 
lent to  simple  types  of  smoothing  networks  inserted 
directly  in  the  prediction  system.  This  is  discussed  in 
more  detail  under  the  heading  "Exponential  Smooth- 
ing," Section  10.1. 


perience  of  the  authors  with  a  series  of  proj- 
ects, largely  sponsored  by  Division  7  of  NDRC, 
concerned  with  the  design  of  electrical  antiair- 
craft directors.  In  addition,  it  draws  largely 
on  the  results  of  a  number  of  other  investiga- 
tions, also  NDRC  sponsored.  The  possible  key 
importance  of  data  smoothing  in  the  design  of 
fire-control  systems  was  recognized  by  Division 
7  early  in  the  course  of  its  activities  and  the 
emphasis  placed  upon  it  in  a  number  cf  proj- 
ects led  to  the  accumulation  of  a  much  larger 
body  of  results  than  nJght  otherwise  have  been 
obtained. 

Data  smoothing  is  developed  here  in  terms 
of  concepts  familiar  in  communication  engi- 
neering. This  is  a  natural  approach  since  data 
smoothing  is  evidently  a  special  case  of  the 
transmission,  manipulation,  and  utilization  of 
intelligence.  The  other  principal,  and  perhaps 
still  more  fundamental,  approach  to  data 
smoothing  is  to  regard  it  as  a  problem  in  sta- 
tistics. This  is  the  line  followed  in  the  classic 
work1  by  Norbert  Wiener/  For  reasons  which 
are  brought  out  later,  Wiener's  theory  is  not 
used  in  the  present  monograph  as  a  basis  for 
the  actual  design  of  data-smoothing  networks. 
Because  of  its  fundamental  iaterest,  however, 
a  sketch  of  Wiener's  theory  is  included.  The 
authors'  apologies  are  due  for  any  mutilation 
to  the  theory  caused  by  the  attempt  to  simplify 
it  and  compress  it  into  a  brief  space. 

The  present  monograph  falls  roughly  into 
two  dissimilar  halves.  The  first  half,  consist- 
ing of  the  first  three  or  four  chapters,  includes 
a  discussion  of  the  general  theoretical  founda- 
tions of  the  data-smoothing  problem,  the  best 
established  ways  of  approaching  the  prob- 
lem, the  assumptions  they  involve,  and  the 
authors'  judgment  concerning  the  assumptions 
which  best  fit  the  tactical  facts.  In  this  part 
may  also  be  included  the  last  chapter,  which 
contains  a  fragmentary  discussion  of  alterna- 
tive data-smoothing  possibilities  lying  outside 
the  main  theoretical  framework  of  the  mono- 
graph. 

The  rest  of  the  monograph  is  concerned  with 
the  technique  of  designing  specific  data-smooth- 
ing structures.  A  fairly  elaborate  and  detailed 
treatment  is  given  here,  in  the  belief  that  the 


•  Wiener  is  also  responsible  for  providing  tools  which 
permit  the  gap  between  the  statistical  and  communica- 
tion point*  of  view  to  be  bridged. 
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problem  of  actually  realizing  a  suitable  data- 
smoothing  device  is,  in  some  ways  at  least, 
as  difficult  as  that  of  deciding  what  the  general 
properties  of  such  a  device  should  be.  The 
technique,  as  given,  draws  heavily  upon  the 
highly  developed  resources  of  electric  network 
theory.  For  this  reason  the  discussion  is 
couched  entirely  in  electrical  language,  al- 
though the  authors  realize,  of  course,  that 
equivalent  nonelectrical  solutions  may  exist. 
For  the  benefit  of  readers  who  may  not  be 
familiar  with  network  theory,  the  monograph 
includes  an  appendix  summarizing  the  prin- 
ciples most  needed  in  the  main  text. 

Two  further  remarks  may  be  helpful  in  un- 
derstanding the  monograph.  The  first  concerns 
the  relation  between  data  smoothing  and  the 
overall  problem  of  prediction  in  a  fire-control 
circuit.  These  two  are  coupled  together  in  the 
title  of  the  monograph,  and  it  is  clear  that  the 
connection  between  them  must  be  very  close, 
since,  as  we  saw  earlier,  small  irregularities  in 
input  data  are  likely  to  be  serious  only  as  they 
affect  the  extrapolation  used  to  determine  the 
future  position  of  a  moving  target.  In  the 
statistical  approach,  in  fact,  data  smoothing 
and  prediction  are  treated  as  a  single  problem 
and  a  single  device  performs  both  operations. 

In  the  attack  which  is  treated  at  greatest 
length  in  the  monograph  a  certain  distinction 
between  data  smoothing  and  prediction  can  be 
made.  To  simplify  the  exposition  as  much  as 
possible,  the  explicit  discussion  in  the  mono- 
graph is  directed  principally  at  data  smooth- 
ing. This,  however(  is  not  intended  to  suggest 
that  there  is  any  real  cleavage  between  the 
two  problems  or  that  the  analysis  as  developed 
in  the  monograph  does  not  also  bear,  by  impli- 
cation, upon  the  prediction  problem.  Any  the- 
ory of  data  smoothing  must  rest  ultimately 
upon  some  hypothesis  concerning  the  path  of 
the  target,  and  the  exact  statement  of  the  as- 
sumptions to  be  made  is  in  many  ways  the  most 
important  as  well  as  the  most  difficult  part  of 
the  problem.  The  same  assumptions,  however, 
are  also  involved  in  the  extrapolation  to  the 
future  position  of  the  target.  It  is  thus  impos- 
sible to  solve  the  data-smoothing  problem  with- 
out also  implying  what  the  general  nature  of 
the  prediction  process  will  be.  For  example, 
the  formulation  given  in  Chapter  9  amounts  to 




the  assumption  that  the  target  path  is  specified 
by  a  set  of  geometrical  parameters  correspond- 
ing to  components  of  velocity,  acceleration,  etc. 
The  data^smoothing  process  centers  about  the 
problem  of  obtaining  reliable  values  for  these 
parameters.  To  obtain  a  complete  prediction 
thereafter,  it  is  merely  necessary  to  multiply 
the  parameter  values  thus  obtained  by  suitable 
functions  of  time  of  flight  and  add  the  results 
to  the  present  position  of  the  target. 

The  other  general  remark  concerns  the  tacti- 
cal criteria  used  in  evaluating  the  performance 
of  a  data-smoothing  system.  This  turns  out  to 
be  one  of  the  most  important  aspects  of  the 
whole  field.  It  is  assumed  here  that  the  tactical 
situation  is  similar  to  that  of  antiaircraft  fire 
against  high-altitude  bombers  in  World  War 
II.  The  defense  can  be  regarded  as  successful  if 
only  a  fairly  small  fraction  of  the  targets  en- 
gaged are  destroyed.  On  the  other  hand,  the 
lethal  radius  of  the  antiaircraft  shell  is  so  small 
that  it  is  also  quite  difficult  to  score  a  kill. 
Under  these,  circumstances  we  are  interested 
only  in  increasing  the  number  of  very  well 
aimed  shots. 

When  we  combine  these  assumptions  with 
the  path  assumptions  described  in  Chapter  9 
we  are  led  to  the  data-smoothing  solution  for- 
mulated here,  in  preference  to  the  solution  ob- 
tained with  the  statistical  approach.  On  the 
other  hand,  we  might  equally  well  envisage  a 
situation  in  which  the  target  contained  an 
atomic  bomb  or  some  other  very  destructive 
agent,  so  that  it  becomes  very  important  to 
intercept  it,  while  the  lethal  radius  of  the  anti- 
aircraft missile  is  correspondingly  increased, 
so  that  great  accuracy  is  not  needed  for  a  kill. 
In  this  situation  our  interest  would  be  focused 
on  the  problem  of  minimizing  the  probability 
of  making  large  misses,  and  the  solution  fur- 
nished by  the  statistical  approach  would  be  ap- 
proximately the  best  obtainable."1 


"  In  fairness  to  the  statistical  solution  it  should  be 
pointed  out  that  it  is  also  the  beat  obtainable,  without 
regard  to  the  lethal  radius  of  the  shell,  if  we  replace 
the  path  assumptions  made  in  Chapter  9  by  a  "random 
phase"  assumption.  The  path  assumptions  in  Chapter 
9  are  almost  at  the  opposite  pole  from  a  random  phase 
assumption,  and  represent  a  deliberate  overstatement, 
made  in  order  to  illustrate  the  theoretical  situation  as 
clearly  as  possible. 
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Chapter  7 

GENERAL  FORMULATION  OF  THE  DATA-SMOOTHING  PROBLEM 


ONE  of  the  principal  difficulties  in  any 
treatment  of  data  smoothing  is  that  of 
stating  exactly  what  the  problem  is  and  what 
criteria  should  be  applied  in  judging  when  we 
have  a  satisfactory  solution.  It  is  consequently 
necessary  to  embark  upon  a  rather  extensive 
general  discussion  of  the  data-smoothing  prob- 
lem before  it  is  possible  to  consider  specific 
methods  of  designing  data-smoothing  struc- 
tures. This  preliminary  survey  will  occupy 
Chapters  7,  8,  and  9.  As  a  first  step  this  chap- 
ter will  describe  two  of  the  general  ways  in 
which  the  data-smoothing  problem  can  be  ap- 
proached mathematically.  The  formulation  of 
the  problem  which  is  finally  reached  in  Chap- 
ter 9  is  not  the  one  which  is  most  obviously 
suggested  by  these  approaches.  This,  however, 
does  not  lessen  their  value  in  characterizing 
the  problem  broadly. 


7.1 


A  PHYSICAL  ILLUSTRATION 


In  an  actual  fire-control  system  the  data- 
smoothing  problem  is  usually  made  fairly  spe- 
cific because  of  the  particular  geometry 
adopted  in  the  computer.  It  may  be  helpful 
to  have  some  particular  case  in  mind  as  a 
touchstone  in  interpreting  the  general  discus- 
sion. For  this  purpose  the  most  appropriate 
example  is  furnished  by  long  range  land-based 
antiaircraft  fire,  since  most  of  the  analysis 
described  in  this  monograph  was  developed 
originally  for  its  application  to  this  problem. 
It  is  usually  assumed  in  the  antiaircraft  prob- 
lem that  the  target  flies  in  a  straight  line  at 
constant  speed,  and  in  one  case  at  least  the 
computer  operates  by  converting  the  input  data 
into  Cartesian  coordinates  of  target  position 
and  differentiating  these  to  find  the  rates  of 
travel  in  the  several  Cartesian  directions. 
These  rates  form  the  basis  of  the  extrapolation. 

The  process  is  illustrated  in  Figure  1.  The 
input  coordinates  are  transformed  into  elec- 
trical voltages  proportional  to  xP,  y,.,  and  zr, 
the  Cartesian  coordinates  of  present  position, 


in  the  coordinate  converter  at  the  left  of  the 
diagram.  The  extrapolation  for  *  is  shown 
explicitly.  It  consists  essentially  in  differen- 
tiating to  find  the  x  component  of  target 
velocity,  multiplying  the  derivative  by  the  time 
of  flight  tf  and  adding  the  result  to  xP  to  find 
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Figure  1.  Dat 
diction  circuit. 


xF,  the  predicted  future  value  of  x.  A  similar 
procedure  fixes  yr  and  zr.  After  the  addition 
of  certain  ballistic  corrections,  these  three  co- 
ordinates of  future  position  are  transformed 
into  gun  aiming  orders  in  the  coordinate  con- 
verter shown  at  the  right  of  the  drawing.  This 
last  unit  also  provides  the  time  of  flight  re- 
quired as  a  multiplier  in  the  extrapolation. 

The  small  irregularities  in  the  input  data 
caused  by  tracking  errors  are  greatly  magni- 
fied by  the  process  of  differentiation.  It  is  thus 
necessary  to  smooth  the  rates  considerably  if 
a  reliable  extrapolation  is  to  be  secured.  The 
data-smoothing  network  for  the  x  coordinate  is 
represented  by  JV,  in  Figure  1.  Since  the  Car- 
tesian velocity  components  are  theoretically 
constants  if  the  assumption  of  a  straight  line 
course  at  constant  speed  is  correct,  a  data- 
smoothing  network  in  this  computer  must  be 
essentially  an  averaging  device  which  gives 
an  appropriately  weighted  average  of  the  fluc- 
tuating instantaneous  rate  values  fed  to  it.  The 
problem  of  "smoothing  a  constant"  is  given 
special  attention  in  Chapter  10.  Aside  from  the 
particular  circuit  of  Figure  1,  we  may,  of 
course,  be  required  to  smooth  a  constant  when- 
ever the  prediction  is  based  upon  an  assumed 
geometrical  course  involving  one  or  more  pa- 
rameters which  are  isolated  in  the  circuit. 
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FORMULATION  OF  THE  DATA-SMOOTHING  PROBLEM 


In  addition  to  smoothing  the  rates  we  can, 
if  we  like,  attempt  to  smooth  the  irregularities 
in  present  position  also.  A  network  to  accom- 
plish this  purpose  is  indicated  by  the  broken 
line  structure  Na  in  Figure  1.  Of  course,  in 
dealing  with  the  present  position  we  are  no 
longer  smoothing  a  constant,  but  suitable  struc- 
tures can  be  obtained  by  methods  described 
later.  However,  the  effect  of  tracking  errors  in 
the  present  position  circuit  is  so  much  less  than 
it  is  in  the  rate  circuit  that  N2  can  generally 
be  omitted. 

Geometrical  assumptions  of  the  sort  implied 
in  Figure  1  are  helpful  in  visualizing  the  prob- 
lem, and  they  are  of  course  of  critical  impor- 
tance in  determining  what  the  final  data- 
smoothing  device  will  be.  It  is  important  not 
to  make  explicit  assumptions  of  this  kind  too 
early  in  the  formal  analysis,  however,  since 
the  meaning  of  such  assumptions  is  one  of  the 
aspects  of  the  general  problem  which  must  be 
investigated.  For  example,  it  is  apparent  that 
no  airplane  in  fact  flies  exactly  a  straight  line, 
nor  flies  a  straight  line  for  an  indefinite  period. 
In  detail,  the  solution  of  the  data-smoothing 
problem  depends  very  largely  on  how  we  treat 
these  departures  from  the  idealized  straight 
line  path.  For  the  present,  consequently,  it  will 
be  assumed  that  the  input  data  are  presented 
to  the  data-smoothing  and  predicting  devices 
in  terms  of  some  generalized  coordinates,  the 
nature  of  which  we  wjll  not  inquire  into  too 
closely.  A  given  coordinate  might,  for  example, 
be  a  velocity,  a  radius  of  curvature,  an  angle  of 
dive  or  climb,  or  any  other  quantity  which 
would  be  directly  useful  in  making  a  predic- 
tion, or  it  might  be  a  simple  position  coordi- 
nate such  as  an  azimuth  or  an  altitude. 

The  data-smoothing  and  predicting  opera- 
tion itself  is  assumed  to  be  performed  by  linear 
invariable  devices.  Aside  from  the  fact  that 
this  assumption  is,  of  course,  a  tremendously 
simplifying  one,  it  also  fits  the  data-smoothing 
problem  very  nicely,  as  the  problem  is  formu- 
lated in  this  chapter.  With  other  formulations, 
however,  it  appears  that  somewhat  better  re- 
sults may  be  obtainable  from  variable  devices 
or  devices  including  more  or  less  radical 
amounts  of  nonlinearity.  These  possibilities  are 
discussed  briefly  in  Chapter  14. 


72  DATA  SMOOTHING  AND 

PREDICTION 
Figure  1  illustrates  a  distinction  between 
two  possible  methods  of  looking  at  the  data- 
smoothing  problem  which  it  is  advisable  to 
establish  for  future  purposes.  In  describing  the 
x  system  in  Figure  1  we  laid  emphasis  on  the 
particular  networks  N,  and  Ns.  It  is  clear,  how- 
ever, that  the  complete  x  circuit  with  input  x, 
and  output  xF  is  a  network  having  overall 
transmission  properties  which  can  be  studied. 
Since  t,  will  normally  vary  with  time,  the  net- 
work is  not,  strictly  speaking,  an  invariable 
one,  but  the  changes  of  t,  are  ordinarily  too 
slow  to  make  this  an  essential  consideration. 

When  it  is  necessary  to  make  a  distinction 
between  these  points  of  view,  a  network  such 
as  Nx,  which  is  merely  an  element  in  the  pre- 
diction process,  will  be  called  a  data-smoothing 
structure.  An  overall  circuit,  providing  data 
smoothing  and  prediction  in  one  step,  will  be 
called  a  data-smoothing  and  prediction  net- 
work, or  simply  a  prediction  network.  Al- 
though these  points  of  view  have  been  illus- 
trated for  rectangular  coordinates,  they  obvi- 
ously apply  also  in  many  other  situations.  For 
example,  we  might  go  so  far  as  to  apply  the 
overall  point  of  view  to  a  complete  circuit  from 
input  azimuth,  say,  to  output  azimuth. 

Both  points  of  view  are  taken  from  time  to 
time  in  the  monograph.  When  possible,  how- 
ever, principal  attention  has  been  given  to  the 
limited  data-smoothing  problem.  This  tends  to 
simplify  the  discussion,  since  the  limited  prob- 
lem is  evidently  more  concrete  than  the  overall 
prediction  problem.  Moreover,  it  permits  us  to 
deal  lightly  with  such  questions  as  the  particu- 
lar choice  of  coordinates  in  which  the  smooth- 
ing operations  are  conducted,  since  it  assumes 
that  the  general  kinematical  framework  of  pre- 
diction has  already  been  decided  upon.  On  the 
other  hand,  the  overall  point  of  view  is  more 
effective  in  certain  situations,  and  it  is  the  only 
natural  one  in  the  statistical  treatment  de- 
scribed in  the  next  section. 

73    DATA  SMOOTHING  AS  A  PROBLEM 
IN  TIME  SERIES 

The  most  direct  and  perhaps  the  most  gen- 
eral approach  to  data  smoothing  consists  in  re- 
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garding  it  as  a  problem  in  time  series.  This 
is  the  approach  used  by  Wiener  in  his  well- 
known  work.1  It  essentially  classifies  data 
smoothing  and  prediction  as  a  branch  of  statis- 
tics. The  input  data,  in  other  words,  are 
thought  of  as  constituting  a  series  in  time 
similar  to  weather  records,  stock  market  prices, 
production  statistics,  and  the  like.  The  well- 
developed  tools  of  statistics  for  the  interpreta- 
tion and  extrapolation  of  such  series  are  thus 
made  available  for  the  data-smoothing  and 
prediction  problem. 

To  formulate  the  problem  in  these  terms, 
let  fit)  represent  the  true  value  of  one  of  the 
coordinates  of  the  target  and  let  git)  repre- 
sent the  observational  error.  Then  fit)  and 
git)  are  both  time  series  in  the  sense  just 
defined.  The  set  of  all  such  functions  corre- 
sponding to  the  various  possible  target  courses 
and  tracking  errors  form  an  ensemble  of  time 
series  or  a  statistical  population.  One  can  im- 
agine that  a  large  number  of  particular  func- 
tions fit)  and  git)  have  been  recorded,  each 
with  a  frequency  proportional  to  its  actual 
frequency  of  occurrence.  Wiener  assumes  that 
they  are  stationary,  that  is,  that  the  statistical 
properties  of  the  ensemble  are  independent  of 
the  origin  of  time.  This,  of  course,  implies  that 
both  functions  exist  from  t  =  —  co  to  i  =  +  co  . 
We  will  sometimes  find  it  more  convenient  to 
make  the  assumption  that  the  two  functions 
vanish  after  some  fixed,  but  sufficiently  remote, 
points  on  the  positive  and  negative  real  t  axis.* 

The  input  signal  to  the  computer  is  of  course 
fit)  +  git).  If  we  assume  that  the  coordinate 
in  question  represents  a  position,  the  quantity 
we  wish  to  obtain  is  fit  +  t,),  where  t,  repre- 
sents the  prediction  time.  If  the  coordinate  is 
a  rate,  we  are  interested  in  an  average  value 
of  f(t)  over  the  prediction  interval.  This  com- 
plicates the  mathematics  somewhat,  but  does 
not  essentially  affect  the  situation. 


»  This  is  done  for  technical  mathematical  reasons.  We 
ahall  later  have  occasion  to  consider  the  Fourier  trans- 
forms of  f(t)  and  0(f),  and,  to  have  well-defined  trans- 
forms, the  integrals  of  the  squares  of  the  two  func- 
tions, from  t  -  -  co  to  t  =  +  <o  ,  should  be  finite.  This 
would  not  happen  under  the  "stationary"  assumption. 
Wiener  avoids  the  difficulty  by  introducing  what  he 
calls  a  generalized  harmonic  analysis,  but  this  method 
is  far  too  complicated  to  be  treated  in  a  brief  sketch 
like  the  present. 


We  shall  not,  of  course,  be  able  to  predict 
fit+tf)  perfectly  accurately.  Let  the  pre- 
dicted value  be  represented  by  f*it  +  t,).  In 
virtue  of  our  assumption  that  the  data- 
smoothing  and  prediction  circuit  is  to  be  a 
linear  invariable  network,  the  relation  between 
f*{t  •¥  t,)  and  the  total  input  signal  fit) 
+git)  can  be  written  as 

/*(<  +  </)  =  /  \M  +  gi<r))dK(a)  (1) 

where  dKia)  represents  the  effect  of  the  data- 
smoothing  and  prediction  circuit.  Comparison 
to  equations  (2)  and  (5)  of  Appendix  A  shows 
that  K  is,  in  fact,  the  indicial  admittance  of 
this  circuit.  The  particular  problem  to  be 
solved  is  of  course  that  of  finding  a  shape  for 
the  function  Ki<r)  which  will  make  +  t,) 
the  best  possible  estimate  of  fit  +  *f). 

The  fact  that  the  upper  limit  of  integration 
in  equation  (1)  is  taken  as  a  =  0  is  particu- 
larly to  be  noted.  It  corresponds  to  the  fact  that 
in  making  a  prediction  we  are  entitled  to  use 
only  the  input  data  which  has  accumulated  up 
to  the  prediction  instant.  This  restriction  will 
be  conspicuous  in  the  next  chapter,  where  the 
time-series  analysis  is  completed. 

7  *  THE  AUTOCORRELATION 

The  principal  statistical  tool  used  in  study- 
ing equation  (1)  is  the  so-called  autocorrela- 
tion. Under  the  "stationary"  assumption  the 
autocorrelation  for  fit)  is  defined  by 

*i(t)  =  g$*hf-T  w*«w>*.  (2) 

We  can  obtain  a  normalized  autocorrelation, 
which  is  more  convenient  for  some  purposes, 
by  dividing  by  </>,(<>)•  This  gives 

C  f(l+r)fit)dt 
,     ,  \      <t>\ir)       ..  J-t 

*"(t)  =  *m  -  Ay. ~r  • « 

J  T  1/(0  J'  dt 

If  we  assume  that  fit)  in  fact  vanishes  for 
sufficiently  large  positive  or  negative  values  of 
t,  the  limit  sign  can  be  disregarded  and  e>lAr(T) 
becomes  simply 
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0,v(r)  -  ffrj     fit  +T)f(t)dt  (4) 

(  /  (ty^dt  and  represents  the  total 
"energy"  in  the  time  series  f(t). 

Precisely  similar  expressions  can  be  set  up 
for  the  autocorrelation  <f>2ir)  or  <j>2K(r)  of  the 
observational  error  function  git).  In  a  gen- 
eral case  we  might  also  have  to  worry  about 
a  possible  cross  correlation  between  fit)  and 
g(t).  This  would  be  represented  by  a  cross- 
correlation  function  <£12(t),  obtained  by  inte- 
grating the  product  f(t  +  r)g(t).  In  practical 
fire  control,  however,  it  can  be  assumed  that 
the  correlation  between  target  course  and 
tracking  errors  is  small  enough  to  be  neglected. 

As  a  simple  example  of  the  calculation  of 
an  autocorrelation  we  may  assume  that  f(t)  = 
sin  wt.  Then 

1  CT 

tf>i  (t)  =  lim  ;r=,  I      sin  u(t  +  t)  sin  wt  •  dt 

=  lim  2?  /     ~  [cos  wt  —  cos  (2wt  +  wr)]d 

-  \  cos  «*,  (5) 

since  the  term  cos  (2a>t  +  an-)  will  contribute 
nothing  in  the  limit. 

The  maximum  value  of  (r)  in  (5)  is  found 
at  t  =  0.  This  is  to  be  expected,  since  ob- 
viously the  correlation  between  identical  val- 
ues of  the  function  is  the  best  possible.  What 
is  exceptional  about  the  present  result  is  the 
fact  that  <£,(t)  is  not  small  for  all  large  t's. 
This  is  fundamentally  a  consequence  of  the 
fact  that  we  chose  an  analytic  expression  for 
fit),  so  that  the  relation  between  two  values 
of  the  function  is  completely  determinate,  no 
matter  how  great  the  difference  between  their 
arguments.  In  a  more  representative  time 
series,  involving  a  certain  amount  of  statisti- 
cal uncertainty,  we  would  expect  £,(r)  to  ap- 
proach zero  as  t  increases,  reflecting  the  in- 
creasing importance  of  statistical  dispersion  as 
the  time  interval  becomes  greater. 

The  significance  of  the  autocorrelation  func- 
tion for  data  smoothing  and  prediction  is  ob- 
vious without  much  study.  Thus,  suppose  for 


simplicity  that  the  observational  error  #(0 
is  zero.  Then  the  autocorrelation  <f>,  (t)  is  the 
only  one  involved.  It  is  a  measure  of  the  ex- 
tent to  which  the  true  target  path  "hangs  to- 
gether" and  is  thus  predictable.  For  example, 
in  weather  forecasting  it  is  a  well-known  prin- 
ciple that  in  the  absence  of  any  other  infor- 
mation it  is  a  reasonably  good  bet  that  tomor- 
row's weather  will  be  like  today's  but  that  the 
reliability  of  such  a  prediction  diminishes  rap- 
idly if  we  attempt  to  go  beyond  two  or  three 
days.  This  would  correspond  to  an  autocorrela- 
tion function  which  is  fairly  large  in  the  neigh- 
borhood of  t  =  0,  but  diminishes  rapidly  to  zero 
thereafter. 

In  a  similar  way  the  autocorrelation  of  the 
observational  error  git)  represents  the  extent 
to  which  this  error  hangs  together.  In  this 
case,  however,  a  high  correlation  is  exactly 
what  we  do  not  want.  Thus,  if  <£2(t)  vanishes 
rapidly  as  r  increases  from  zero,  closely  neigh- 
boring values  of  g  are  quite  uncorrelated,  and 
we  need  only  average  the  input  data  over  a 
short  interval  in  the  immediate  past  in  order 
to  have  most  of  the  observational  errors  aver- 
aged out.  If  4>2ir)  is  substantial  for  a  much 
longer  range,  on  the  other  hand,  a  much  longer 
averaging  period  is  necessary,  with  corre- 
spondingly greater  uncertainties  in  the  value 
obtained  for  fit). 

«■    THE  LEAST  SQUARES  ASSUMPTION 

The  autocorrelation  function  does  not  in  it- 
self suffice,  to  determine  a  time  series  com- 
pletely. For  example,  it  is  easily  seen  that  the 

functions  sin  t  +  sin  2t  and  sin  t  +  cos  2t  have 
the  same  autocorrelation  in  spite  of  the  fact 
that  they  represent  waves  of  quite  different 
shape.  The  autocorrelation  function,  however, 
has  a  peculiar  importance  in  the  fact  that 
under  many  circumstances  it  is  the  only  piece 
of  information  about  the  time  series  which  we 
need  to  know. 

The  significance  of  the  autocorrelation  be- 
comes apparent  as  soon  as  we  investigate  the 
error  in  prediction.  In  many  mathematical  sit- 
uations involving  linear  systems  it  is  conven- 
ient to  deal  with  the  square  of  the  error  rather 
than  with  the  error  itself,  since  a  first  varia- 
tion in  the  error  squared  expression  gives  a 
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linear  relationship  in  the  quantities  of  direct 
interest.  We  will  deal  with  the  square  of  the 
error  here.  If  E  represents  the  instantaneous 
error,  /*  (t  +  t,)  -  fit  +  t,) ,  the  mean  square 
error  over  a  long  period  of  time  is  evidently 


lim 


L  f* 

=  iim  —  r 


\r(t  +  t,)  -f(t  +  t,)}*dt 


[f(t  +  tf)]*dt 


-  lim  ^  f  f(t  +  t,)f*(t  +  t/)dt 

T  -»»  TJ_T 

+  lim  JL  I'*  ir(t  +  t,)\2dt.  (6) 

The  first  integral  in  equation  (6)  can  be 
evaluated  immediately.  From  (2)  it  is  <M0). 
To  evaluate  the  second  integral  replace  f*(t 
+  tf)  by  its  definition  from  (1).  This  gives 

-lim  lfTf{t  +  t,)dt  ["[fit  -  r) 

+  g(t  -  T)]dK(r)  =  -   lim  ]-  f  dK{r) 

(T  lf(t  +  t/)f(t-r)+f{t  +  t/)g(t-r)}dt 
J-T 

if  we  reverse  the  order  of  integration.  Since 
we  assume  that  /  and  g  are  uncorrelated,  how- 
ever, the  product  f  (t  +  tf)g\t  -  r)  in  this  ex- 
pression makes  no  contribution  to  the  final  re- 
sult, and  by  replacing  the  integral  of  f(t  +  t,) 
f(t  —  t)  by  its  value  in  terms  of  4>l  the  expres- 
sion as  a  whole  can  be  written  as 
■ 

-if  <t>i(tf  +t)  dK(T). 


The  third  integral  in  (6)  can  be  simplified  in 
similar  fashion.  The  final  result  becomes 


&  -  4>i  (P)  -  2 


f  *i 
Jo 


(tf  +  r)  dK(r) 


(7) 


+J\k{t)  £  [0i(r  -  c)  +  Mr  ~  <r))dK(c)  . 

The  only  quantities  appearing  in  equation 
(7)  are  the  autocorrelations,  <£,  and  4>2,  of  the 
true  target  path  and  the  observational  error, 
and  the  function  K  which  specifies  the  data- 
unoothing  structure.  The  theoretical  problem 


with  which  we  are  confronted  is  evidently  that 
of  choosing  K  to  make  the  mean  square  error 
as  small  as  possible  for  any  given  $'s.  This 
problem  will  not  be  attacked  here,  although  a 
solution  obtained  by  a  somewhat  indirect 
method  is  presented  in  the  next  chapter.  The 
principal  reason  for  deriving  equation  (7)  is 
to  demonstrate  the  very  important  fact  that 
the  mean  square  error  depends  only  upon  the 
two  autocorrelations.  No  other  characteristics 
of  the  input  data  need  be  considered. 

It  will  be  recalled  that  the  mean  square  cri- 
terion was  introduced  originally  on  the  ground 
of  mathematical  convenience.  This  leaves  un- 
settled the  question  of  how  good  a  measure  of 
performance  for  a  data-smoothi;  g  network  it 
actually  is.  This  is  a  critical  question,  since 
upon  it  depends  the  validity  of  the  whole  ap- 
proach outlined  in  this  chapter.  A  priori,  the 
least  squares  criterion  is  a  dubious  one  since 
it  gives  principal  weight  to  large  errors.  In 
fire  control  we  are  normally  interested  only  in 
shots  which  are  close  enough  to  register  as  hits. 
If  a  shot  misses  it  makes  little  difference 
whether  the  miss  is  large  or  small.  The  merits 
of  the  least  squares  criterion  are  considered 
in  more  detail  in  Chapter  9,  where  the  conclu- 
sion is  reached  that  the  criterion  is  probably 
adequate  for  many  problems  but  needs  to  be 
supplemented  or  replaced  in  others,  including 
the  special  case  of  heavy  antiaircraft  fire  to 
which  particular  attention  is  given  in  this 
monograph.  Pending  the  discussion  in  Chapter 
9,  the  least  squares  criterion  will  be  assumed 
to  be  a  valid  one,  with  the  understanding  that 
the  analysis  is  intended  primarily  for  its  value 
in  contributing  to  the  general  understanding  of 
the  data-smoothing  problem  rather  than  as  a 
means  of  fixing  the  exact  proportions  of  an  op- 
timal smoothing  network. 

DATA  SMOOTHING  AS  A  FILTER 
PROBLEM 

The  time-series  approach  to  data  smoothing 
is  closely  associated  with  another  which  at  first 
sight  may  seem  quite  different.  This  second 
approach  is  suggested  by  the  procedures  used 
in  communication  engineering.  Here  the  sig- 
nals, be  they  voice,  music,  television,  or  what 
not,  are  again  time  series.  Instead  of  dealing 
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with  actual  signals  varying  in  a  more  or  less 
irregular  and  random  manner  with  time,  how- 
ever, it  is  customary  to  deal  with  their  equiva- 
lent steady-state  components  on  the  frequency 
spectrum.6 

The  analysis  of  data  smoothing  can  conven- 
iently be  approached  by  supposing  that  both 
the  true  path  of  the  target  and  the  effects  of 
tracking  errors  are  represented,  in  a  similar 
way,  by  their  frequency  spectra.  When  the 
situation  is  presented  in  this  way,  however, 
there  is  an  obvious  analogy  between  the  prob- 
lem of  smoothing  the  data  to  eliminate  or  re- 
duce the  effect  of  tracking  errors  and  the  prob- 
lem of  separating  a  signal  from  interfering 
noise  in  communication  systems.  We  may  take 
as  an  example  of  the  latter  the  transmission 
of  voice  or  music  by  ordinary  radio  over  fairly 
long  distances,  so  that  the  effects  of  static  in- 
terference are  appreciable.  In  such  a  system 
a  reasonable  separation  of  the  desired  signal 
from  the  static  can  be  obtained  by  means  of 
a  filter.  In  a  representative  situation  an  ap- 
propriate filter  might  transmit  frequencies  up 
to  perhaps  2,000  or  3,000  cycles  per  second,' 
while  rejecting  higher  frequencies. 

The  choice  of  any  specific  cutoff,  such  as 
2,000  or  3,000  c,  in  the  radio  system  depends 
upon  a  compromise  between  conflicting  consid- 
erations. Both  speech  or  music  and  static  nor- 
mally include  components  of  all  frequencies 
which  can  be  heard  by  the  human  ear.  Thus, 
suppressing  any  frequency  range  below  the 
limits  of  audibility,  at  perhaps  10,000  or  20,000 
c,  will  injure  the  signal  to  some  extent.  The 
intensity  of  the  signal  components,  however, 
diminishes  rapidly  above  2,000  or  3,000  c,  while 
the  energy  of  the  static  interference  is  more 
evenly  distributed  over  the  spectrum.  Thus,  by 
filtering  out  the  first  2,000  or  3,000  c,  we  can 
retain  most  of  the  signal  while  rejecting  most 
of  the  noise.  Naturally,  the  exact  dividing  line 
will  depend  upon  the  relative  levels  of  signal 
and  noise  power.  If  the  static  interference  is 
quite  weak,  for  example,  it  would  be  worth 

b  The  review  of  communication  theory  given  in  Ap- 
pendix A  shows  how  this  equivalence  is  established  by 
Fourier  or  Laplace  transform  methods. 

0  In  practice,  of  course,  the  filtering  would  probably 
take  place  in  the  radio-frequency  circuits,  but  it  is 
more  convenient  here  to  think  of  it  occurring  in  the 
demodulated  output. 


while  to  transmit  a  considerably  wider  band 
in  order  to  retain  a  more  nearly  perfect  signal. 
If  the  static  level  is  extremely  high,  on  the 
other  hand,  it  would  be  necessary  to  transmit  a 
still  narrower  band  at  the  cost  of  greater  mu- 
tilation of  the  signal. 

The  separation  of  the  true  path  of  a  target 
from  the  observed  path  including  tracking 
errors,  as  a  preliminary  to  prediction  of  the 
future  position  of  the  target,  presents  an  ap- 
proximately analogous  situation.  Again  the 
spectrum  of  the  "signal"  or  true  path  is  con- 
centrated principally  in  a  low-frequency  band, 
in  most  instances,  while  the  energy  of  tracking 
errors  or  "noise"  appears  principally  at  con- 
siderably higher  frequencies.  Thus  the  two  can 
be  separated  by  a  low-pass  filter.  The  separa- 
tion, however,  is  not  complete  since  some  com- 
ponents of  the  signal  spectrum  extend  into  the 
noise  region.  Thus  the  smoothing  process  must 
be  accompanied  by  some  mutilation  of  the  sig- 
nal, and  the  optimum  compromise  is  again 
attained  from  a  filter  which  transmits  a  rela- 
tively broad  band  when  the  tracking  errors  are 
of  low  intensity  and  a  much  narrower  band 
when  they  are  large. 

In  these  terms  the  most  obvious  difference 
between  the  data-smoothing  problem  and  the 
static  interference  problem  in  the  radio  system 
is  in  the  order  of  magnitude  of  the  frequencies 
involved.  They  are  roughly  10,000  times  smaller 
in  the  data-smoothing  case.  Thus,  the  typical 
signal  band  in  a  fire-control  system  may  cover 
a  few  tenths  of  a  cycle  per  second,  in  compari- 
son with  a  useful  band  of  2,000  or  3,000  c  in  a 
radio  system,  and  the  spectrum  of  tracking 
errors  or  noise,  with  representative  tracking 
devices,  includes  appreciable  components  up  to 
perhaps  2  or  3  c,  in  comparison  with  a  total 
effective  noise  band  in  the  radio  system  ex- 
tending to  the  limits  of  audibility  at  perhaps 
20,000  c. 

This  analogy  between  data  smoothing  and 
the  filtering  problems  which  appear  in  ordi- 
nary communication  systems  transmitting 
speech  or  music  must  of  course  not  be  carried 
too  far.  For  example,  previous  experience  with 
communication  filters  is  of  no  help  in  fixing  in 
detail  the  cutoff  in  attenuation  characteristic 
of  the  data-smoothing  filter,  since  in  communi- 
cation systems  these  choices  depend  on  psycho- 
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logical  considerations  of  no  relevance  in  the  fire- 
control  problem.  Methods  of  determining  the 
best  rules  for  proportioning  a  data-smoothing 
filter,  therefore,  remain  to  be  determined.  We 
may  also  notice  that,  whereas  the  time-series 
approach  was  of  the  data-smoothing  and  pre- 
diction type,  the  filter  approach  emphasizes 
data  smoothing  only.  The  addition  of  the  pre- 
diction function  can  be  expected  to  change  ma- 
terially the  overall  characteristics  of  the  cir- 
cuit. Neither  of  these  remarks,  however,  robs 
the  filter  approach  of  its  value  as  a  simple  way 
of  thinking  about  the  problem  qualitatively. 

RELATION  BETWEEN  TIME-SERIES 
AND  FILTER  APPROACHES 


7.7 


The  time-series  and  filter  methods  of  looking 
at  data  smoothing  are  related  to  one  another 
by  the  fact  that  the  autocorrelation  can  be  com- 
puted from  the  amplitude  spectrum,  or  vice 
versa,  by  Fourier  transform  means.  Consider, 
for  example,  the  Fourier  transform  of  the 
autocorrelation.  If  we  make  use  in  particular 
of  (4)  we  have 
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F(w)  is  of  course  the  steady-state  spectrum 
of  the  signal  f(t).  Equation  (8)  thus  states 
that  the  Fourier  transform  of  <f>.s-  is  equal  to  a 
constant  times  the  square  of  the  amplitude  of 
the  steady-state  spectrum.  The  amplitude 
squared  spectrum  is,  however,  a  measure  of 


the  power  per  cycle.  The  relation  is  therefore 
equivalent  to  the  statement  that  the  autocorre- 
lation and  power  spectrum  are  Fourier  trans- 
forms of  each  other. 

Since  we  have  already  established  the  fact 
that  the  mean  square  error  in  prediction  de- 
pends only  on  the  autocorrelation,  this  analysis 
enables  us  to  conclude  immediately  that  the 
mean  square  error  can  also  be  calculated  from 
the  power  spectra  of  the  signal  and  noise.  It 
is  entirely  independent  of  the  phase  relations 
in  either  signal  or  noise.  The  phase  character- 
istics of  the  data-smoothing  network,  which 
operates  on  the  signal  after  a  specific  wave 
shape  has  been  established,  is,  of  course,  still 
of  consequence. 


PHYSICAL  AND  TACTICAL 
CONSIDERATIONS 


Thus  far  the  material  which  has  been  pre- 
sented has  been  primarily  mathematical.  It 
has  consisted,  in  other  words,  of  outlines  of 
general  analytical  methods  which  are  available 
for  use  with  the  data-smoothing  problem.  It  is 
also  possible  to  approach  the  problem  in  a 
much  more  concrete  fashion.  It  is  obvious  that 
by  giving  thought  to  the  details  of  the  physical 
characteristics  of  tracking  units  and  targets, 
and  to  the  tactical  situations  with  which  we 
expect  to  deal,  it  should  be  possible  to  draw  a 
number  of  specific  conclusions  about  the  prob- 
lem as  a  whole.  In  a  general  theory  of  the  de- 
sign and  tactical  use  of  fire-control  apparatus 
such  an  approach  might  well  be  a  primary  one. 
It  is  scarcely  possible  to  follow  it  in  detail  in 
the  present  discussion.  The  following  para- 
graphs, however,  indicate  some  of  the  kinds  of 
considerations  which  can  be  brought  into  the 
problem  in  this  way.  It  will  be  seen  that  they 
tend  to  modify  the  strictly  mathematical  ap- 
proach, partly  by  qualifying  to  some  extent  the 
assumptions  made  in  the  mathematics,  and 
partly  by  tending  to  give  much  more  emphasis 
to  particular  aspects  of  the  problem  than  would 
appear  in  a  general  analytic  outline. 


Choice  of  ouukuiinatbb 

One  of  the  most  obvious  omissions  in  the 
general  analysis  thus  far  is  any  consideration 
of  the  choice  of  coordinates  in  which  the  data 
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smoothing  is  to  take  place.  So  far  as  either 
the  statistical  or  filter  theory  is  concerned,  the 
coordinates  in  the  data  smoother  may  repre- 
sent either  the  original  tracking  data  or  any 
transformation  of  them.  The  fact  that  there  is 
actually  something  to  be  decided  here,  however, 
is  easily  seen  from  the  long-range  antiaircraft 
problem.  The  input  tracking  coordinates  for 
antiaircraft  would  normally  be  azimuth,  eleva- 
tion, and  slant  range.  If  the  airplane  flies  in  a 
straight  line  roughly  overhead,  the  general 
shape  of  the  azimuth  and  the  azimuth  rate  as 
functions  of  time  are  given  by  the  curves  in 
Figure  2.    The  curves  become  indefinitely 
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Figure  2.  Azimuth  and  azimuth  rate  for  crossing 
target. 


steeper  as  the  target  path  approaches  the 
zenith,  and  it  will  be  seen  that  if  the  approach 
is  reasonably  close,  either  the  azimuth  or  the 
azimuth  rate  must  include  a  very  substantial 
amount  of  high-frequency  energy.  Since  the 
possibility  of  an  effective  separation  between 
the  signal  and  noise  in  the  filter  approach  de- 
pends upon  the  assumption  that  the  signal  com- 
ponents are  of  quite  low  frequency  with  respect 
to  the  noise,  the  presence  of  this  high-frequency 
energy  is  evidently  serious. 

When  the  target  describes  a  violently  evasive 
path  the  signal  spectrum  must  naturally  in- 
clude substantial  high-frequency  components, 
whatever  the  coordinate  system  may  be.  The 
high-frequency  components  indicated  in  Figure 
2,  however,  are  due  to  the  fact  that  the  target 
path  happens  to  pass  almost  over  the  director 
and  are  essentially  superimposed  upon  the 
high-frequency  components  which  reflect  the 
complexity  of  the  target  path  itself.  It  is  clear 


as  a  matter  of  principle  that  an  acceptable 
coordinate  system  for  data  smoothing  should 
not  introduce  frequency  components  which  de- 
pend upon  such  accidental  factors  as  the  loca- 
tion and  orientation  of  the  coordinate  system. 
The  rectangular  system  mentioned  in  connec- 
tion with  Figure  1  evidently  meets  this  condi- 
tion; so  also  does  the  "intrinsic"  system  de- 
scribed in  the  next  section. 

Physical  Limitations  of  Target  or  Tracker 

We  may  also  approach  the  data-smoothing 
question  by  a  consideration  of  the  motions 
which  are  physically  possible  either  in  the 
target  or  in  the  tracking  device.  In  the  heavy 
antiaircraft  problem,  for  example,  there  are 
substantial  physical  limitations  on  the  per- 
formance possibilities  of  present-day  aircraft 
We  can  be  quite  sure  that  any  motion  incom- 
patible with  these  limitations  is  necessarily  a 
tracking  error  and  can  be  removed  from  the 
incoming  data.  Naturally,  these  limitations 
must  appear  in  the  power  spectrum  of  the  sig- 
nal if  they  affect  the  mean  square  error  in  pre- 
diction, so  that  their  existence  in  no  way  dis- 
putes the  mathematical  framework  we  have 
set  up.  Consideration  of  the  physical  factors 
which  produce  them,  however,  may  permit 
them  to  be  established  more  easily  or  in  more 
clear-cut  fashion  than  would  be  possible  from 
a  statistical  examination  of  target  records 
alone. 

The  limitations  on  airplane  performance 
can  be  stated  most  simply  when  the  motion  of 
the  airplane  is  expressed  in  so-called  intrinsic 
coordinates.  These  are  the  speed  of  the  air- 
plane, its  heading,  and  its  angle  of  dive  or 
climb.  The  maneuvering  possibilities  of  a  con- 
ventional airplane  in  these  three  directions  are 
quite  unequal.  By  banking  sharply  it  can 
maneuver  violently  to  the  right  and  left  and 
thus  make  quick  changes  in  heading.  The  pos- 
sibilities of  maneuvering  up  and  down,  how- 
ever, are  considerably  less,  particularly  for  a 
heavy  airplane,  where  there  are  usually  restric- 
tions on  the  maximum  angle  of  dive  or  climb 
which  can  be  assumed.  The  possibilities  of 
quickly  changing  the  speed  of  the  airplane, 
finally,  are  almost  nil.  The  thrust  of  an  air- 
plane propeller  is  so  small  in  comparison  with 
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the  mass  of  the  airplane  that  only  small  accel- 
erations are  possible.*1 

Thus  the  optimum  filters  for  the  three  coor- 
dinates should  be  different.  The  one  for  speed 
can  have  a  very  narrow  band,  since  most  of 
the  signal  energy  for  this  coordinate  occurs  at 
very  low  frequencies.  The  optimum  band  for 
the  angle  of  dive  or  climb,  however,  should  be 
larger  (unless  it  turns  out  that  pilots  seldom 
make  use  of  maneuvering  possibilities  in  this 
direction)  and  the  one  for  the  heading  larger 
still.  In  this  ability  to  discriminate  among  the 
various  possible  directions  of  motion  the  in- 
trinsic coordinate  system  is  evidently  an  im- 
provement even  on  the  rectangular  system. 


Settling  Time 


Another  aspect  of  the  data-smoothing  prob- 
lem which  has  not  been  given  conspicuous  at- 
tention in  the  purely  mathematical  discussion 
is  the  fact  that  in  an  actual  tactical  situation 
questions  of  elapsed  time  are  of  great  impor- 
tance^ Engagements  usually  begin  suddenly 
and  last  for  a  comparatively  brief  period,  and 
it  is  important  to  find  a  data-smoothing  scheme 
which  provides  adequate  firing  data  as  quickly 
as  possible  after  an  engagement  starts.  A  situ- 
ation essentially  similar  to  the  beginning  of  an 
engagement  may  also  be  presented  whenever 
the  target  makes  a  sudden  change  of  course  or 
whenever  it  is  necessary  to  shift  from  one 
target  to  another  in  a  given  attacking  body. 
The  time  required  for  a  computer  to  give 
usable  output  data  after  any  of  these  events  is 
its  so-called  "settling  time,"  and  is  one  of  the 
most  important  parameters  of  any  data- 
smoothing  system.  It  is  possible  to  make  rough 
estimates  of  settling  time  by  indirect  means  in 
both  the  statistical  and  filter  theories  of  data 
smoothing,  but  no  explicit  consideration  of 
necessary  time  lapses  appears  in  either  theory. 
Evidently,  the  fundamental  fault  lies  with  the 
"stationary"  assumption. 


*  This  ignores  the  possibility  of  changing  the  speed 
through  gravitational  forces.  Since  these  possibilities 
are  linked  to  the  angle  of  dive  or  climb,  however,  they 
can  be  predicted.  This  has  actually  been  done  in  one 
experimental  computer. 


Effect  of  Human  Factors 

Aside  from  the  conditions  on  target  perform- 
ance which  arise  from  the  physical  character- 
istics of  the  target  itself,  there  are  others 
which  are  due  to  the  fact  that  the  target  is 
under  the  control  of  a  human  being  with  a 
definite  purpose.  The  language  of  the  statistical 
and  filter  methods  is  broad  enough  to  cover 
almost  any  situation.  It  tends  to  suggest,  how- 
ever, that  the  typical  target  paths  with  which 
we  deal  are  the  relatively  structureless  conse- 
quences of  random  physical  forces.  The  inter- 
vention of  purposive  human  behavior,  on  the 
other  hand,  tends  to  give  paths  which  fall  into 
more  or  less  definite  patterns.  A  simple  illus- 
tration is  furnished  by  the  argument  which  is 
frequently  offered  in  defense  of  the  straight 
line  assumption  in  dealing  with  antiaircraft 
defense  against  heavy  bombers.  It  is  contended 
that  while  the  targets  may  in  fact  engage  in 
substantial  evasive  maneuvers  during  most  of 
their  flight,  there  will  always  be  a  substantial 
period  during  the  bombing  run  in  which  they 
must  fly  very  straight  in  order  to  achieve 
bombing  accuracy.  On  the  basis  of  ordinary 
probability  we  would  of  course  expect  substan- 
tial straight  line  segments  quite  infrequently 
if  the  course  as  a  whole  shows  marked  disper- 
sion, and  the  intervention  of  the  human  pilot 
thus  provides  a  higher  degree  of  structure  than 
one  would  expect  in  a  corresponding  situation 
dominated  by  purely  natural  factors. 

A  broader  example  is  furnished  by  a  com- 
parison of  two  airplanes,  or  perhaps  more 
simply  of  two  boats,  one  of  which  is  under  the 
control  of  a  human  operator,  while  in  the  other 
the  steering  controls  are  lashed  in  a  neutral 
position.  Both  boats,  say,  may  be  expected  to 
experience  small  variations  of  course  due  to  the 
random  effects  of  wind  and  waves  upon  them. 
Over  a  short  period  of  time  the  observed  mo- 
tions of  the  two  boats  should  be  substantially 
identical.  In  the  case  of  the  boat  with  the 
lashed  helm  these  random  variations  will  tend 
to  accumulate,  so  that  it  is  possible  to  make  a 
reasonable  prediction  of  the  position  of  the 
boat  for  only  a  comparatively  short  distance 
in  the  future.  In  the  boat  with  the  human 
steersman,  on  the  other  hand,  we  may  expect 
corrections  to  be  applied  as  soon  as  the  random 
effects  become  large,  so  that  the  boat  tends  to 
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retain  the  same  general  course  and  it  is  pos- 
sible to  predict  its  position  hours  or  even  days 
later  from  a  relatively  brief  observation. 

Neither  of  these  illustrations  is  inconsistent 
with  the  mathematical  framework  laid  down 


phase  relations,  even  if  the  special  features  in 
these  situations  may  be  the  controlling  factors 
in  determining  the  actual  probability  of  hit- 
ting. If  we  could  believe  the  bombing  run 
hypothesis,  for  example,  and  had  a  sufficiently 


earlier  in  the  chapter,  in  a  purely  theoretical    accurate  computer  and  gun,  we  could  expect 


sense.  For  example,  the  bombing  run  illustra- 
tion merely  states  that  because  of  the  presence 
of  the  human  operator  there  are  definite  phase 
relations  in  the  input  signal.  As  we  have  seen, 
such  relations  can  exist  without  affecting  com- 
putations based  on  mean  square  error.  The 


to  score  a  hit  in  every  engagement,  no  matter 
how  large  the  mean  square  error  might  be. 
More  generally,  it  is  probably  only  the  ten- 
dency of  targets  to  exhibit  "line  spectra"  which 
prevents  the  real  probability  of  a  kill,  small 
at  best,  from  becoming  microscopic.  It  is  nec- 


comparison  between  the  piloted  and  pilotless    essary  to  lay  special  emphasis  on  these  factors 


boats  can  be  interpreted  as  the  result  primarily 
of  differences  in  the  signal  power  spectrum. 
In  the  case  of  the  pilotless  boat,  for  example, 
the  signal  occupies  a  fairly  continuous  low- 
frequency  band,  while  in  the  case  of  the  piloted 
boat  it  must  be  regarded  as  concentrated  very 
closely  around  zero  frequency,  so  that  it  is  ap- 
proximately a  line  spectrum  superimposed  on 
a  continuous  one.  The  formal  mathematical 
theory  covers  also  such  cases  as  these. 

The  point  of  this  discussion,  however,  is  that 
the  mathematical  theory,  although  it  is  suf- 
ficiently general  in  a  formal  sense,  fails  to  dif- 
ferentiate between  such  situations  as  those 
just  described  and  the  more  shapeless  sort  which  the  mean  square  error  is  not  a  good 
involving  continuous  spectra  with   random    guide  to  the  actual  probability  of  scoring  a  hit. 


in  order  to  keep  the  overall  fire  control  picture 
in  perspective. 

CRITERION  OF  PERFORMANCE 

Last  on  this  list  of  doubts  about  the  statisti- 
cal and  filter  theories,  we  may  mention  the 
least  squares  criterion  of  accuracy.  This  was 
discussed  before,  but  it  is  mentioned  again  as 
a  matter  of  emphasis,  and  because  of  its  close 
relation  with  the  factors  we  have  just  dis- 
cussed. For  example,  the  bombing  run  illustra- 
tion obviously  represents  one  situation  in 
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STEADY-STATE  ANALYS 

Tt  was  shown  in  the  previous  chapter  that 
J-  both  the  statistical  and  filter  theory  ways  of 
looking  at  the  data-smoothing  problem  lead 
naturally  to  an  analysis  in  terms  of  the  power 
spectra  of  the  signal  and  noise.  The  phase  rela- 
tions are  not  important  as  long  as  we  accept 
the  mean  square  error  as  a  criterion  of  per- 
formance. The  inadequacies  of  the  mean  square 
criterion  will  finally  force  us  to  abandon  the 
steady-state  attack  in  favor  of  a  direct  analysis 
in  terms  of  the  wave  shapes  of  some  assumed 
signals.  The  steady-state  attack  is  nevertheless 
a  very  useful  one.  This  chapter  will  conse- 
quently continue  the  analysis  from  this  point 
of  view.  It  will  be  assumed  as  heretofore  that 
the  heavy  antiaircraft  problem  is  the  particular 
subject  of  interest. 

A  large  part  of  the  discussion  hinges  upon 
the  conditions  which  must  be  satisfied  by  the 
external  characteristics  of  an  electrical  net- 
work if  it  is  to  be  capable  of  physical  realiza- 
tion in  any  way  whatever.  These  limitations 
and  the  characteristics  which  may  be  postulated 
for  physical  networks  are  decisive  since,  in  the 
absence  of  such  restrictions,  no  limits  could  be 
set  upon  the  performance  which  might  be  ex- 
pected from  data-smoothing  and  predicting 
circuits.  The  facts  about  physically  realizable 
networks  which  we  shall  find  of  most  use  are 
summarized  below,  but  the  reader  not  familiar 
with  this  field  is  urged  to  read  also  the  account 
given  in  Sections  A.9  and  A.10,  Appendix  A.»* 
The  conditions  which  must  be  satisfied  by 
physically  realizable  networks  can  be  stated  in 
either  transient  or  steady-state  terms.  In  tran- 
sient terms  they  are  expressed  most  simply  by 
the  statement  that  the  response  of  a  physical 
network  to  an  impulsive  force  must  be  zero  up 
to  the  time  the  force  is  applied.  Thus  the  net- 
work has  no  power  to  predict  a  purely  arbi- 
trary event.  That  is,  it  has  no  way  of  foresee- 
ing whether  or  not  an  impulse  is  actually  going 
to  be  applied  to  it.  This  characteristic  of  physi- 
cal networks  is  taken  as  a  postulate. 

The  steady-state  limitations  on  physical  net- 
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works  are  expressed  in  terms  of  their  attenua- 
tion and  phase  characteristics.  They  may  be 
derived  either  from  the  transient  specification 
or  from  the  postulate  that  a  physical  network 
must  be  stable.  There  are  no  important  limita- 
tions to  be  placed  upon  the  attenuation  and 
phase  characteristics  of  physical  networks  as 
long  as  we  deal  with  these  characteristics  "sepa- 
rately, but  there  are  very  severe  limitations  on 
the  phase  characteristic  which  can  be  associated 
with  any  given  attenuation  characteristic  or 
vice  versa.  In  particular,  when  the  attenuation 
characteristic  is  prescribed,  there  is  a  definite 
formula  for  calculating  the  unique  limiting 
phase  characteristic  with  which  it  may  be  asso- 
ciated.1" This  is  the  so-called  "minimum  phase" 
characteristic  because  any  other  physical  net- 
work having  the  postulated  attenuation  char- 
acteristic must  have  as  great  or  greater  phase 
shift  at  every  frequency.  As  we  shall  see  later, 
this  greater  phase  characteristic  would  corre- 
spond to  longer  lags  in  obtaining  usable  data, 
so  that  the  minimum  phase  characteristic  is 
the  optimum  for  a  data-smoothing  network. 
The  minimum  phase  characteristic  has  the  addi- 
tional important  property  that  not  only  does 
it  specify  the  transfer  admittance  of  a  physical 
network,  but  the  reciprocal  of  that  transfer 
admittance  can  also  be  realized  by  a  physical 
structure.' 

In  addition  to  this  principal  formula  for  the 
relation  between  attenuation  and  phase  there 
are  a  number  of  subsidiary  expressions  for 
special  aspects  of  the  problem.  One  in  partic- 
ular, relating  the  attenuation  to  the  behavior 
of  the  phase  characteristic  in  the  neighborhood 
of  zero  frequency,  is  used  extensively  in  this 
chapter. 


» In  limiting  cases,  such  as  may  be  found  when  the 
transfer  admittance  contains  zeros  or  poles  exactly  on 
the  real  frequency  axis,  the  "physical  structure"  may 
require  such  constituents  as  ideally  nondissipative  re- 
actances, perfect  amplifiers  with  unlimited  gain,  etc. 
This,  however,  is  of  no  consequence  for  the  present 
general  discussion. 
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" 1  THE  SIGNAL  SPECTRUM 

It  is  natural  to  begin  with  a  discussion  of  the 
spectrum  of  a  typical  target  path.  Unfortu- 
nately no  data  on  the  spectra  of  actual  meas- 
ured airplane  paths  exist,  and  the  theoretical 
assumptions  which  may  be  made  about  paths 
of  airplane  targets  are  best  discussed  in  the 
next  chapter.  This  section  consequently  will  be 
confined  to  rather  general  observations  about 
the  problem.  It  will  be  convenient  to  assume 
for  definiteness  that  the  quantities  to  be 
smoothed  are  the  velocity  components  in  Car- 
tesian coordinates. 

The  simplest  point  of  departure  is  furnished 
by  the  conventional  assumption  that  the  target 
flies  in  a  straight  line  at  constant  speed.  If  we 
could  construe  this  assumption  literally,  it 
would  mean  that  the  velocity  spectrum  in  rec- 
tangular coordinates  would  reduce  to  a  single 
line  at  zero  frequency.  In  practice,  of  course, 
the  spectrum  is  not  so  simple.  Even  in  the 
absence  of  deliberate  maneuvering,  the  target 
will  fly  a  slightly  curved  path  because  of 
"wander."  Moreover,  even  if  the  target  could 
fly  exactly  straight,  the  single  line  spectrum 
would  apply  only  to  a  straight  course  in- 
definitely continued.  The  spectrum  becomes 
more  complicated  if  we  consider  the  fact  that 
tracking  must  have  begun  at  some  finite  time 
in  the  past,  or  that  the  target  may  presumably 
change  occasionally  from  one  straight  line 
course  to  another. 

As  a  result  of  both  these  causes,  the  actual 
signal  spectrum  must  be  regarded  as  occupying 
a  band  bordering  on  zero  frequency.  The  distri- 
bution of  energy  in  detail  will,  of  course, 
depend  on  particular  circumstances.  The  band 
has  no  very  well  defined  upper  limit,  but  in 
most  cases  the  great  bulk,  at  least,  of  the 
energy  should  be  below,  say,  one-fourth  or  one- 
fifth  of  a  cycle  per  second.  For  example,  the 
natural  periods  of  a  heavy  airplane,  which  one 
would  expect  to  be  correlated  with  wander,  are 
below  this  limit."  This  limit  is  also  sufficient  to 
include  most  of  the  energy  resulting  from 
changes  in  course  occurring  as  frequently  as 
every  ten  or  twenty  seconds. 

In  general,  it  is  to  be  supposed  that  the  sig- 
nal spectrum  varies  as  where  n  may  be 
1,  2,  3,  depending  on  the  frequency  range.  This 
follows  from  general  considerations  of  the 


limitations  of  airplane  performance.  Thus,  if 
we  suppose  that  the  velocity  changes  discon- 
tinuous^ from  time  to  time,  it  follows  from 
general  Fourier  principles  that  the  amplitude 
must  vary  as  This  is  presumably  a  fair 
representation  of  the  actual  signal  spectrum  at 
low  frequencies.  At  moderate  frequencies,  how- 
ever, we  must  take  account  of  the  fact  that  the 
velocity  can  actually  be  changed  rapidly  but 
not  discontinuously,  and  we  consequently 
assume  that  the  amplitude  begins  to  vary  as 
ura.  Finally,  at  frequencies  of  the  order  of  per- 
haps one  cycle  per  second  one  must  take  ac- 
count of  the  fact  that  the  airplane  must  bank 
in  order  to  turn.  Since  it  takes  some  time  to  roll 
into  the  bank,  even  the  acceleration  in  the  lat- 
eral direction  cannot  be  discontinuous,  and 
consequently  the  amplitude  must  begin  to  vary 
as  c.r\  The  application  of  such  successive  limit- 
ing factors  in  constructing  a  complete  spec- 
trum is  described  in  more  detail  in  Section  A.8 
of  Appendix  A. 

One  other  general  condition  of  the  same  kind 
can  be  mentioned.  It  can  be  shown"  that  the 
integral  from  zero  to  infinity  of  log  H/l  +  if", 
where  H  is  the  power  spectrum,  is  very  impor- 
tant in  determining  the  properties  of  a  time 
series.  More  explicitly,  the  integral  converges 
if  the  series  is  essentially  statistical,  so  that  we 
cannot  foretell  the  future  from  the  past  with 
absolute  certainty.  This  of  course  is  the  case 
with  an  actual  signal  spectrum  in  a  fire-control 
problem.  It  implies  two  consequences;  first, 
that  H  cannot  be  zero  over  any  finite  band ;  and 
second,  that  in  the  neighborhood  of  infinite  fre- 
quency H  diminishes  slowly  enough  so  that 
| log  H\/o>->0. 

•«  THE  NOISE  SPECTRUM 

The  spectrum  of  tracking  errors  depends 
largely  upon  the  particular  sort  of  tracking 
equipment  involved.  Broadly  speaking,  optical 
tracking  equipment  (at  least  that  of  the  present 
or  recent  past)  tends  to  produce  tracking  errors 
not  only  of  small  amplitude,  but  also  of  low 
frequency,  so  that  they  are  hard  to  separate 
from  the  signal  spectrum.  Radar  equipment,  of 
the  present  time,  produces  higher-frequency 
errors.  Relatively  high-frequency  errors  are 
particularly  likely  to  be  found  in  very  stiff 
automatic  tracking  radars. 
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A  number  of  examples  of  spectra  of  tracking 
errors  are  shown  in  Figures  1,  2,  and  3.  The 
spectra  are  given  directly  in  terms  of  range 
and  angle  errors.  To  make  them  comparable 
with  the  velocity  spectra  described  previously 
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it  would  be  necessary  to  multiply  all  amplitudes 
by  io.  In  addition,  it  would  of  course  also  be 
necessary  to  multiply  the  angle  rates  by  some 
suitable  range  in  order  to  compare  them  di- 
rectly with  the  yards-per-second  rates  we  have 
otherwise  considered. 

After  multiplication  by  <■>,  the  radar  spectra 
appear  to  be  about  flat  up  to  perhaps  one  cycle. 
Beyond  that  point  they  no  doubt  drop  off 
slowly,  although  the  accuracy  of  the  data  is  not 
sufficient  to  permit  the  situation  to  be  stated 
very  exactly. 


8.3 


RANDOM  NOISE  FUNCTIONS 


The  properties  of  the  signal  and  noise  as  we 
assume  them  here  can  be  conveniently 
expressed  by  reference  to  the  theory  of  so-called 


"random  noise"  functions.h  A  random  noise  can 
be  defined  as  a  function  which  has  a  definite 
amplitude  spectrum  but  completely  random 
phase  characteristics.  The  theory  of  such  func- 
tions is  well  developed  because  of  their  frequent 
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occurrence  in  physics.  It  is  probable  that 
neither  our  noise  functions  nor  our  signal  func- 
tions are,  strictly  speaking,  random  noise  ac- 
cording to  this  definition.  Thus,  there  are  proba- 
bly certain  definite  phase  relations  in  our  noise 
functions  because  of  the  physical  character- 
istics of  tracking  devices.  There  is  no  evidence, 
however,  that  any  such  relations  are  important 
enough  to  be  significant  in  the  data-smoothing 
problem,  so  that  we  are  fully  justified  in  iden- 
tifying them  with  random  noise  functions  as 
defined  above.  The  phase  relations  in  the  signal 
are  by  no  means  random.  As  long  as  we  con- 
sider only  the  mean  square  error,  however,  this 
factor  is  immaterial,  and  we  can  replace  the 
actual  signal  by  a  random  noise  function  with 
the  same  power  spectrum  for  purposes  of 
analysis. 

The  most  familiar  example  of  a  random 
noise  function  is  furnished  by  the  thermal 

"The  fact  that  we  also  refer  to  tracking  errors  as 

"noise"  is,  of  course,  merely  a  coincidence. 


CONFIDENTIAL 


88 


voltage  across  a  resistance  R.  This  is  a  random 
noise  whose  spectrum  is  constant  up  to  very 
high  frequencies  with  the  value  P  ==  4\kTR  (k 
is  Boltzmann's  constant  and  T  the  absolute 
temperature) .  A  second  example  is  black  body 
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radiation.  If  there  is  black  body  radiation  in  a 
space,  the  electric  (or  magnetic)  field  intensity 
at  a  point  is  a  random  noise  function  with 


spectrum 


P(D  = 


8*/3  1 


according  to  Planck's  law.  Random  noise  func- 
tions also  occur  in  the  Schottky  effect,  in 
Brownian  motion,  and  in  diffusion  and  heat 
flow  problems. 

For  purposes  of  analysis,  a  random  noise 
function  can  be  thought  of  as  a  function  made 
up  of  a  large  number  of  sinusoidal  components, 
which  are  very  closely  spaced  in  frequency 
and  whose  phases  are  completely  ran- 
dom.21 231  Thus  a  random  noise  can  be  repre- 
sented as 

.V 

2]  a-  cos  {unt  +  <(>n) 

n  -  1 

where  wn  —  n&f,  A/  being  the  frequency  differ- 
ence between  adjacent  components.  The  phase 


angles  <f>„  are  random  variables  which  are  in- 
dependent with  a  uniform  probability  distribu- 
tion from  0  to  2tt.  As  A/  decreases  the  functions 
in  this  ensemble  approach,  in  a  certain  sense, 
a  limiting  ensemble,  providing  the  amplitudes 
a„  are  adjusted  properly.  What  is  desired  is  to 
have  the  total  power  in  the  neighborhood  of 
each  frequency  approach  a  certain  limit  P(/), 
the  power  spectrum  at  that  frequency.  To  do 
this  we  make 

a.i  =  2tP(/)A/. 

In  the  limiting  ensemble  the  total  power  within 
a  small  frequency  range  A/  is  then  P(/)A/. 
The  function  PU)  completely  describes  the 
random  noise  ensemble  from  the  statistical 
point  of  view. 

A  particularly  important  special  case  is  that 
of  a  random  noise  with  a  constant  power  spec- 
trum. This  is  often  called  "flat"  or  "white" 
noise.  True  constancy  out  to  infinite  frequencies 
is  of  course  impossible  since  it  would  imply  an 
infinite  total  power  in  the  function.  The  idea 
is,  however,  still  useful  and  can  be  approxi- 
mated, as  with  resistance  noise,  by  having  a 
spectrum  which  is  constant  out  to  such  high 
frequencies  that  behavior  beyond  this  point  is 
of  no  importance  to  the  problem.  We  may  con- 
veniently think  of  flat  random  noise  as  being 
made  up  of  a  succession  of  weak  impulses  oc- 
curring frequently  but  at  random  times  with 
respect  to  one  another.  This  results  from  the 
fact  that  a  Fourier  analysis  of  a  single  impulse 
gives  a  flat  spectrum,  and  the  random  occur- 
rence of  many  of  them  produces  a  random  set 
of  phases.  In  a  physical  problem,  such  as  resis- 
tance noise  or  Brownian  motion,  these  im- 
pulses might  correspond  to  the  effects  of  indi- 
vidual small  particles.  Such  a  situation  is  of 
course  completely  chaotic.  If  the  impulses  are 
large  and  occur  relatively  infrequently,  the 
power  spectrum  is  still  flat,  though  the  func- 
tion is  no  longer  a  random  noise  function  as 
defined  here.  This  conception,  which  corre- 
sponds to  a  physical  situation  including  definite 
causative  elements,  will  be  revived  later  under 
the  name  of  the  elementary  pulse  method  of 
analysis. 

Random  noise  functions  have  a  number  of 
interesting  characteristics.  For  example,  they 
have  the  "ergodic  property."  This  means  that 


CONFIDENTIAL 


89 


averaging  a  statistic  along  the  length  of  a  par- 
ticular random  function  give'  the  same  results 
as  averaging  the  same  statistic  over  an 
ensemble  of  functions  having  the  t  ime  power 
spectrum.  Each  function  is  typical  of  the 
ensemble.  To  be  more  precise  one  must  admit 
exceptions,  but  the  probability  of  an  exception 
is  zero.  For  example,  if  we  determine  the  frac- 
tion of  time  a  given  random  function  f(t)  has 
a  value  greater  than  some  constant  .4,  it  will 
be  equal  to  the  fraction  of  all  functions  in  the 
ensemble  which  are  greater  than  A  at  t  —  0 
(with  probability  1 ) . 

A  second  characteristic  of  random  noise 
functions  is  the  fact  that  they  frequently  lead 
to  Gaussian  or  normal  law  distributions.  For 
example,  the  aronlit'-Hes  of  a  random  noise 
function  are  di^tri^  <:._d  about  zero  in  accord- 
ance with  the  nvr^ttal  error  law.  Likewise,  the 
amplitudes  for  two  points  spaced  a  given  dis- 
tance apart  form  a  two-dimensional  normal 
error  law  distribution  when  we  consider  all 
possible  positions  of  the  first  point.  It  is  ap- 
parent that  if  the  signal  and  noise  are  actually 
random  functions  the  mean  square  error  is  as 
good  a  criterion  of  performance  as  any  other, 
since  it  completely  fixes  the  distribution  in  a 
normal  law  case. 

A  final  property  of  random  noise  functions 
is  the  fact  that  if  a  random  noise  is  passed 
through  a  filter  the  output  is  still  a  random 
noise.  If  the  power  spectrum  of  the  noise  is 
P(w)  and  the  transfer  characteristic  of  the 
filter  is  Y(iw),  the  output  spectrum  is 
P(a>)\Y(iw)  \\  In  particular,  if  we  take  the 
derivative  of  a  random  noise  with  spectrum 
P(w)  we  obtain  one  with  spectrum  w2P(w). 

This  last  property  of  random  noise  functions 
suggests  a  method  of  representing  them  which 
we  shall  find  useful  in  the  future.  The  method 
is  represented  by  Figure  4.  It  consists  of  a 
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Figure  4.    Circuit  representation  of  random 
functions. 

source  of  flat  noise  followed  by  a  shaping  filter 
to  give  the  desired  power  spectrum.  We  can 
easily  assign  to  the  filter  the  characteristics  of 
a  physically  realizable  structure  by  making  use 


of  the  relations  between  attenuation  and  phase 
mentioned  earlier  in  the  chapter.  It  is  merely 
necessary  to  convert  the  desired  power  spec- 
trum into  a  specification  of  the  attenuation 
characteristic  of  the  filter  and  then  use  the 
loss-phase  formula  to  compute  the  correspond- 
ing phase  shift.  It  will  be  assumed  that  this 
procedure  has  been  followed  when  we  make  use 
of  this  circuit  at  a  later  point. 

The  method  of  representing  random  func- 
tions thown  by  Figure  4  illustrates  graphically 
the  basis  of  the  prediction  schemes  described 
thus  far.  The  flat  noise  is  of  course  absolutely 
unpredictable.  The  history  of  the  function  up 
to  any  given  instant  gives  no  indication  of  its 
value  even  a  microsecond  later.  The  filter,  how- 
ever, forces  the  output  current  to  have  a  cer- 
tain structure  on  which  a  prediction  may  be 
based.  For  example,  if  the  filter  will  pass  only 
very  low  frequencies  it  is  clear  that  the  output 
can  change  very  little  in  a  microsecond. 

84    THEORETICAL  PROPORTIONS  FOR 
A  DATA-SMOOTHING  FILTER 

The  signal  and  noise  spectra  furnish  the  raw 
material  from  which  a  suitable  data-smoothing 
filter  can  be  deduced.  We  have  still  to  deter- 
mine, however,  the  exact  rule  for  choosing  the 
cutoff  and  attenuation  characteristic  of  the 
filter  from  these  spectra.  It  is  clear  that  previ- 
ous experience  with  signal-to-noise  problems 
in  systems  transmitting  voice-  or  music  is  no 
help,  since  the  filter  proportions  here  depend 
upon  psychological  considerations  of  no  rele- 
vance to  the  fire-control  problem.  For  example, 
the  interfering  effect  of  a  small  amount  of 
noise  is  much  greater  than  one  might  expect 
from  energy  considerations,  especially  in  in- 
tervals of  low  message  level,  and  it  is  con- 
sequently worth  while  to  maintain  a  relatively 
high  level  of  attenuation  in  the  noise  band. 
Conversely,  the  breadth  of  the  band  required 
for  the  message  depends  as  much  on  the  ability 
of  the  ear  to  reconstruct  a  complete  signal 
from  an  incomplete  one  as  it  does  upon  the 
actual  signal  power  spectrum. 

In  the  data-smoothing  case  a  suitable  crite- 
rion, dependent  upon  more  physical  considera- 
tions, can  be  obtained  by  minimizing  the  rms 
error  at  the  filter  output.  This  criterion  is 
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easily  developed  from  the  power  spectrum  ap- 
proach, and  in  a  sense  it  is,  of  course,  the  only 
possible  one  as  long  as  we  follow  the  methods 
developed  thus  far. 

A  very  general  theory  for  the  minimization 
of  the  rms  error  of  the  filter  output  has  been 
developed  by  Wiener.1  Since  the  power  spec- 
trum approach  is  not  the  one  we  shall  eventu- 
ally follow,  however,  it  is  not  necessary  to  give 
this  analysis  in  detail.  The  nature  of  the  rela- 
tionships can  be  seen  from  an  elementary  corn- 
in  Figure  5  let  OA  be  a  unit 


square  error  is  a  minimum  if 


0' 

Figure  5.    Vector  relation  between  input  and  out- 
put of  data-smoothing  network. 

vector  representing  the  signal  component  at 
some  particular  frequency.  Let  the  amplitude 
ratio  between  the  input  and  output  of  the  data- 
smoothing  filter  be  x,  and  let  it  be  assumed  that 
the  system  is  phase  distortionless.  This  can 
always  be  accomplished,  at  the  cost  of  lag,  by 
phase  equalization.  Then  the  actual  signal 
output  can  be .  represented  by  OB,  where 
OB/OA  =  x.  Let  the  ratio  of  noise  power  to 
signal  power  at  this  frequency  be  k2.  Then  the 
output  noise  can  be  represented  by  the  vector 
BC,  at  some  arbitrary  phase  angle  6,  where 
BC/OA  =  kx. 

The  error  in  the  output  of  the  data-smooth- 
ing filter  is  evidently  represented  by  the  vector 
AC.  We  have 

(Acy  =  (CM)?i(i  -  x  -  kxcosey  +  (kxsmey] 

=  {OA)*  l  (1  -  is)  -  2*i(l  -  x)  cos  6  +  k'x') . 

Since  6  is  random  the  cross-product  term  in- 
volving cos  6  disappears  on  the  average.  (More 
generally,  it  disappears  as  long  as  the  noise  and 
signal  are  uncorrelated,  whether  or  not  their 
relative  phases  are  entirely  random.)  This 
leaves  the  mean  square  error  as 

Wan  -    (OA)l  [1    _  2Z  +  (1   +  *»)*»]  .  (1) 


x  — 


1 


1  +  A-»      PN  +  Ps 


where  PB  and  Ps  are,  respectively,  the  signal 
and  noise  power  at  this  frequency.  Upon  sub- 
stituting this  result  in  equation  (1)  and  "re- 
membering that  (OAV  =  PB,  we  find  that  the 
minimum  mean  square  error  is 

PsPs  (2) 


min 


Ps  +  Pi 


Equation  (2)  evidently  represents  the  sought- 
for  rule  for  the  filter  transmission  character- 
istic. It  is  illustrated  in  Figure  6,  where  PN 
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Figure  6.  Optimum  transmission  characteristic 
for  data  smoothing  assuming  signals  with  random 
noise  characteristics. 


Figure  7.  Si 
in  Figure  6. 


spectra  assumed 


and  Pt  have  been  chosen  respectively  as  the 
flat  curve  and  the  1/w*  curve  in  Figure  7.  In 
comparison  with  the  characteristics  of  typi- 
cal filters  in  communication  systems  it  is  quite 
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rounded  with  a  relatively  slowly  falling  ampli- 
tude characteristic.  More  important  than  the 
detailed  rule  for  the  transmission  character- 
istic, however,  is  the  conclusion  that  the  shape 
of  the  characteristic  is  not  very  critical.  There 
is  very  little  loss  in  replacing  the  actual  curve 
in  Figure  6,  by  any  other  similar  character- 
istic. For  example,  we  might  validate  the 
assumption  of  zero  phase  distortion  by  making 
use  of  the  curve  which  automatically  gives  a 
linear  phase  shift.150 

A  more  extreme  illustration  is  furnished  by 
the  infinitely  selective  filter  characteristic,  with 
perfect  transmission  in  the  range  in  which  the 
signal  power  is  greater  than  the  noise  power, 
and  zero  transmission  elsewhere,  indicated  by 
the  broken  lines  in  Figure  6. 

It  follows  from  equation  (1)  that  in  the 
neighborhood  of  the  cutoff  point  <o0  the  mean 
square  error  for  this  filter  is  twice  that  of  the 
optimum  structure.  In  most  frequency  ranges, 
however,  the  penalty  is  far  less  than  this.  Since 
even  a  two-to-one  change  in  the  mean  square 
error  would  produce  no  tremendous  improve- 
ment in  the  effectiveness  of  fire,  it  is  clear  that 
the  result  to  which  we  are  led  by  this  method 
of  attack  is  by  no  means  critical. 

LAGS  IN  DATA-SMOOTHING  FILTERS 

The  analysis  just  concluded  has  been  directed 
at  the  amplitude  characteristics  of  a  data- 
smoothing  filter.  By  virtue  of  the  relations  be- 
tween the  amplitude  and  phase  characteristics 
of  physical  networks  mentioned  earlier  in  the 
chapter,  however,  the  analysis  permits  us  to 
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Figure  8.    Some  filter  attenuation  characteristics. 


give  at  least  a  partial  description  also  of  the 
phase  characteristics  of  the  filters.  This  is  an 
important  consideration  because  it  bears  upon 
the  question  of  time  delays  in  data-smoothing 
systems  which  was  mentioned  in  Chapter  7. 

The  general  nature  of  the  relationship  in 
simple  cases  is  illustrated  by  Figures  8  and  9. 
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Figure  9.  Corresponding  minimum  phase  char- 
acteristics. 


Figure  8  shows  a  series  of  rising  attenuation 
characteristics  equivalent  to  rather  unselective 
falling  amplitude  characteristics  of  the  general 
type  shown  by  the  principal  curve  in  Figure  6. 
Figure  9  shows  the  corresponding  phase  char- 
acteristics computed  on  a  minimum  phase  shift 
basis.  In  Figure  8  the  central  attenuation  char- 
acteristic B  has  been  so  chosen  that  the  corre- 
sponding phase  characteristic  in  Figure  9  is 
exactly  a  straight  line  at  low  frequencies, 
where  the  transmitted  amplitudes  are  appreci- 
able. Curves  A  and  C  in  the  two  drawings  show 
slightly  different  cases,  but  it  is  clear  from 
the  figures  that  the  tendency  of  the  phase 
characteristics  to  approximate  linearity  is  still 
marked. 

In  communication  engineering  a  phase  char- 
acteristic proportional  to  frequency  is  inter- 
preted as  indicating  a  delay  in  seconds  equal  to 
the  slope  dB/dw  of  the  phase  characteristic. 
This  relation  is  illustrated  most  simply  by  an 
ideal  line.  The  ideal  line  has  zero  attenuation 
combined  with  a  phase  shift  which  is  propor- 
tional to  frequency  and  which  at  any  given  fre- 
quency is  also  proportional  to  the  length  of  the 
line  in  question.  If  we  apply  any  arbitrary 
wave  to  the  line  it  is  propagated  down  the  line 
with  a  definite  velocity  and  unchanged  wave 
form.  The  time  required  for  the  wave  to  reach 
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any  point  on  the  line  is  equal  to  the  slope  of  the 
phase  characteristic  to  that  point. 

In  a  structure  like  a  filter,  which  has  an  at- 
tenuation characteristic  varying  with  fre- 
quency, it  is  of  course  no  longer  possible  to 
transmit  an  arbitrarily  impressed  wave  with- 
out change  in  wave  shape.  Even  if  the  applied 
wave  is  merely  a  suddenly  applied  d-c  voltage 
or  single  frequency  sinusoid,  there  is  a  tran- 
sient period  before  the  response  approximates 
its  final  value.  In  structures  having  a  substan- 
tially linear  phase  characteristic  over  any  fre- 
quency range  in  which  they  exhibit  an  appreci- 
able amplitude  response,  however,  this  total 
transient  characteristic  falls  naturally  into  two 
parts.  The  first  is  a  waiting  period  equal  to  the 
slope  of  the  phase  characteristic,  during  which 
the  response  is  very  small,  whereas  the  second 
is  a  true  transient  period  in  which  the  response 
is  substantial  but  does  not  resemble  the  final 
steady-state  response.  This  is  illustrated  by 
Figure  10  which  shows  the  voltage  at  the  fifth 
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Figure  10.    Voltage  at  fifth  section  of  conventional 
low-pass  filter  in  response  to  unit  d-c  voltage. 

section  of  a  conventional  low-pass  filter  in 
response  to  a  d-c  voltage  applied  at  zero  time 
at  the  input  terminals.1"  The  end  of  the  waiting 
period,  as  deduced  from  the  slope  of  the  phase 
characteristic,  is  indicated  by  the  broken  line. 

Delays  of  the  sort  just  illustrated  must  be 
expected  in  a  data-smoothing  filter  whenever 
the  nature  of  the  signal  is  changed.  This  hap- 
pens at  the  beginning  of  tracking,  in  changing 
from  one  target  to  another,  or  even  in  follow- 
ing a  single  target  when  the  target  makes  an 
abrupt  change  in  course.  Since  usable  data  in 
a  fire-control  system  must  be  quite  accurate, 
the  delay  to  be  allowed  for  must  include  both 
the  initial  waiting  period  and  the  subsequent 


transient  period  until  the  transient  ripples 
have  almost  vanished.  A  considerable  part  of 
the  art  of  desi0  ung  data-smoothing  networks 
consists  in  controlling  the  design  so  that  these 
final  transient  ripples  decay  relatively  rapidly. 
We  are  not  yet  ready  to  discuss  this  problem: 
It  will  turn  out,  however,  that  the  minimum 
interval  which  can  be  assigned  to  the  "true 
transient"  period  is  about  equal  to  that  which 
must  be  allowed  for  the  initial  waiting  period/ 
Thus  the  slope  of  th?  phase  characteristic  can 
be  used  as  an  index  of  the  lags  which  must  be 
expected  in  data  smoothing  merely  by  doubling 
the  delay  to  which  the  slope  would  normally  be 
said  to  correspond. 

When  we  use  the  phase  slope  as  an  index  of 
delay  it  becomes  immediately  apparent  that 
lags  are  the  necessary  consequence  of  smooth- 
ing in  physical  circuits.  This  is  easily  seen  by- 
reference  to  the  relations  which  must  exist  be- 
tween attenuation  and  phase  characteristics  in 
physical  structures.  An  example  is  provided  by 
the  formula15*1 

(3) 

where  A  is  attenuation,  .4,,  is  the  attenuation 
at  zero  frequency,  and  B  is  phase  shift.  In  other 
words,  the  delay  (measured  by  the  slope  of  the 
phase  characteristic  at  zero  frequency)  is  pro- 
portional to  the  integral  of  the  attenuation  on 
an  inverse  frequency  scale  when  the  attenua- 
tion at  zero  frequency  is  taken  a&.the  reference. 
The  equation  thus  states  that  the  system  will 
exhibit  a  lagging  response  as  long  as  there  is  a 
net  high-frequency  attenuation.  As  a  numerical 
illustration,  let  it  be  supposed  that  A  is  zero 
below  4»  —  1.  This  corresponds  to  the  estimate 
made  earlier  in  the  chapter  that  the  input  sig- 
nal components  in  antiaircraft  work  lie  roughly 
in  the  band  below  about  0.1  or  0.2  cycle  per  sec- 
ond. Let  it  be  supposed  also  that  A  at  higher 
frequencies  is  equal  to  3  nepers,  corresponding 
to  an  average  amplitude  reduction  of  about  20 


c  This  is  not  intended  to  imply  that  the  distinction 
between  the  initial  waiting  period  and  the  "true  tran- 
sient" period  is  quite  as  sharp  as  it  is  in  Figure  10.  The 
selectivity  in  a  data-smoothing  filter  is  usually  not 
great  enough  to  justify  the  assumption  that  components 
beyond  the  linear  phase  region  are  of  negligible  im- 
portance. 
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to  1.  Then  dB/d*  at  the  origin  is  given  from 
equation  (3)  as  S/n  seconds,  and  in  accordance 
with  the  rule  just  enunciated  the  minimum  de- 
lay to  be  expected  from  such  a  structure  in  a 
data-smoothing  application  would  consequently 
be  12/ir  seconds. 

Aside  from  such  specific  quantitative  rela- 
tions equation  (3)  is  useful  as  a  basis  for  a 
number  of  important  qualitative  conclusions. 
One,  for  example,  is  the  fact  that  although  a 
lag  is  a  necessary  concomitant  of  any  system 
showing  a  high-frequency  attenuation,  the 
amount  of  the  lag  depends  greatly  upon  the 
portion  of  the  frequency  spectrum  in  which 
the  attenuation  is  found.  Since  the  integral  is 
taken  on  an  inverse  frequency  scale,  a  small 
attenuation  at  low  frequencies  is  much  more 
important  than  a  considerably  greater  attenua- 
tion further  out  in  the  spectrum.  This  points  to 
the  desirability  of  designing  tracking  instru- 
ments which  generate  principally  high-fre- 
quency noise,  even  if  the  amplitude  of  the  noise 
is  somewhat  increased  thereby.  We  may  also 
notice  that  since  the  attenuation  is  a  logarith- 
mic function  of  amplitude  an  initial  moderate 
reduction  in  the  amplitude  of  disturbing  noise 
may  be  much  less  expensive  in  lag  than  subse- 
quent attempts  at  further  reduction.  For  ex- 
ample, an  amplitude  reduction  from  100  to  10 
per  cent  over  a  given  portion  of  the  frequency 
spectrum  produces  no  more  lag  than  a  subse- 
quent reduction  from  10  to  1  per  cent. 

»«    WIENER'S  PREDICTION  THEORY- 
ZERO  NOISE  CASE 

In  Chapter  7  we  distinguished  between  what 
we  called  the  simple  data-smoothing  problem 
and  the  data-smoothing  and  prediction  prob- 
lem. The  simple  problem,  with  which  this  re- 
port is  chiefly  concerned,  is  the  one  which  has 
been  given  principal  attention  thus  far.  On 
account  of  its  broad  interest,  however,  it  seems 
worth  while  to  include  also  a  brief  statement 
of  Wiener's  solution  of  the  general  problem. 
The  method  of  development  used  here  is  intui- 
tive and  nonrigorous  in  comparison  with 
Wiener's  own  development,  but  it  permits  the 
principal  relations  to  be  established  by  very 
elementary  means. 

It  is  convenient  to  consider  first  the  zero 
noise  case.  The  past  history  of  the  signal,  then, 


is  known  perfectly,  and  the  existence  of  a 
prediction  problem  depends  entirely  upon  the 
fact  that  since  the  signal  is  assumed  to  be  sta- 
tistical in  character,  its  future  is  not  com- 
pletely determined  from  its  past.  The  situation 
can  be  thought  of  in  the  terms  suggested  by 
Figure  11.  The  actual  signal  output  appears  at 
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Figure  11.  Schematic  representation  of  Wiener's 
prediction  theory  when  there  is  no  noise. 


P,.  In  accordance  with  the  discussion  earlier 
in  the  chapter,  we  imagine  this  signal  to  be 
generated  by  passing  flat  noise  through  the 
shaping  network  Nx.  The  transfer  admittance 
Yx(iio)  of  Nt  is  determined  from  the  power 
spectrum  of  the  signal  by  the  procedure  out- 
lined earlier  and  is  a  minimum  phase  shift  char- 
acteristic. It  will  be  recalled  that  minimum 
phase  shift  transfer  admittances  have  the  im- 
portant property  that  their  reciprocals  are  also 
the  transfer  admittances  of  physically  realiz- 
able networks. 

From  F,  we  can  readily  compute  the  tran- 
sient response  characteristic  of  N\.  We  shall 
assume  for  illustrative  purposes  that  the  im- 
pulsive admittance  of  Nl  takes  the  special 
shape  shown  by  Figure  12. 


Figure  12.  Assumed  impulsive  admittance  of 
shaping  filter. 


The  flat  noise  is  thought  of  as  consisting  of 
a  large  number  of  elementary  impulses  with 
random  amplitudes  and  occurring  at  random 
times.  For  the  purposes  of  this  analysis,  how- 
ever, it  is  sufficient  to  consider  only  the  three 
unit  impulses  shown  in  Figure  13.  Impulse  B 
is  supposed  to  occur  at  the  instant  at  which 
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the  prediction  is  to  be  made,  A  occurs  two  sec- 
onds in  the  past,  and  C,  one  second  in  the 
future.  The  response  of  AT,  to  these  three  im- 
pulses will  evidently  be  three  curves  of  the 
sort  given  by  Figure  12,  suitably  displaced  in 
time  as  shown  by  Figure  14. 


B 

1 


-2       -I  0 

Figure  13.  Impulses  giving  rise  to  applied  signal 
through  shaping  filter. 

The  desired  output  of  the  predicting  network 
is  the  curve  of  Figure  14  advanced  by  the  pre- 
diction time,  which  we  can  assume,  for  illus- 
tration, to  be  two  seconds.  It  may  be  assumed 


SUM  \ 

I 
t 

# 

1  , 

a  •  I 

»  " 

$ 
$ 

9  1 

"Hf  \r 

/\  '* 
/  V  * 

\ 

\ 

% 

\ 

t  \ 

%  \ 
*  \ 

t 
$ 
$ 

1 

0 

.  * 

I 

* 

/ 

t 

< 

 V 

-< 

0          2         4  t 

8 

Figure  14.   Applied  signal  at  P„ 

for  the  sake  of  preliminary  analysis  that  the 
input  of  the  predicting  network  is  the  three 
original  impulses  of  Figure  13.  The  terminal 


Pt  at  which  they  are  supi 


appear  is  of 


course  a  purely  fictitious  one  and  is  not  acces- 
sible to  us  physically.  We  can,  however,  con- 
struct the  equivalent  terminal  P'3  by  imposing 
the  actual  signal  from  terminal  Px  on  the  net- 
work N2,  whose  transfer  admittance  is  the 
reciprocal  of  that  of 


Let  the  predicting  network  connected  to  ter- 
minal Fa  be  represented  by  N,.  Obviously  a 
perfect  prediction  would  be  secured  if  Nt  could 
be  assigned  the  impulsive  admittance  shown  in 
Figure  15,  that  is,  an  impulsive 
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Figure  15.  Iueal  impulsive  a 
tion  network  N,  in  Figure  11. 


equal  to  the  impulsive  admittance  of  the  origi- 
nal network  but  moved  forward  by  the  2-second 
prediction  time.  Then  all  the  constituent  curves 
and  the  sum  curve  in  Figure  14  would  similarly 
be  moved  forward.  Of  course  we  cannot  assign 
ATS  an  impulsive  admittance  which  is  different 
from  zero  at  negative  times  without  postulat- 
ing a  nonphysical  network.  It  is,  however,  per- 
fectly possible  to  define  N,  from  the  portion  of 
the  impulsive  admittance  characteristic  at  posi- 
tive times,  with  the  remainder  set  equal  to 
zero.  This  gives  an  impulsive  admittance  of 
the  type  shown  by  Figure  16.  When  energized 
by  the  three  unitary  impulses,  it  gives  the 
result  shown  in  Figure  17.  The  contributions 
of  impulses  A  and  B  are  not  affected  by  the 
absence  of  a  negative  time  portion  of  the  im- 
pulsive admittance,  but  the  contribution  of  im- 
pulse C  is  lost. 

To  formulate  a  physical  prediction  network 
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Figure  16.  Realizable  portion  of  required  im- 
pulsive admittance. 
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we  have  merely  to  find  by  conventional  meth- 
ods the  steady-state  admittance  Y,  corre- 
sponding to  the  impulsive  admittance  of  Figure 
16.  The  two  networks  AT,  and  A7;1  may  then  be 


in  the  manner  shown  by  Figure  18.  The  first 
source  of  flat  noise,  together  with  the  shaping 
network  N,„  is  the  combination  we  have  already 
used  to  represent  the  signal  in  the  noise-free 
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Figure  17.    Response  of  realizable  prediction  net- 
work. 

combined  to  give  a  single  structure  with  the 
transfer  admittance  Y,Y:  =  YJY,  which  will 
give  the  complete  prediction  when  energized  by 
the  actual  signal. 

The  mean  square  error  in  prediction  is 
easily  determined  from  the  fact  that  the  con- 
tributions of  all  impulses  of  the  sort  repre- 
sented by  C,  occurring  in  the  prediction  in- 
terval, are  lost.  Since  impulses  in  the  flat  noise 
source  occur  at  random  times  the  mean  square 


error  is  proportional 


tojT 


W-(T)dT,  where  a 


is  the  prediction  time  and  W  is  the  impulsive 
admittance  of  Figure  16.  Since  the  flat  noise 
impulses  occurring  after  the  time  at  which  the 
prediction  is  made  are  surely  unpredictable,  it 
is  clear  that  this  error  is  the  least  we  could 
expect  any  physical  prediction  network  to  have 

WIENER'S  THEORY-GENERAL  CASE 
When  the  input  data  includes  noise  as  well  as 
the  signal  it  is  natural  to  think  of  the  situation 
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Figure  18.   Circuit  representation  of  random  func- 
tions representing  signal  and  noise. 

case.  The  addition  of  noise  is  represented  by 
the  second  independent  source  of  flat  noise  with 
its  associated  shaping  network  Nh.  They  com- 
bine to  give  the  total  input  measured  at  Pt. 

This  diagram  emphasizes  the  fact  that  we 
think  of  the  noise  and  signal  as  originating 
from  different  physical  sources.  By  postulate, 
however,  we  are  not  able  to  separate  the 
sources  experimentally.  So  far  as  any  observed 
result  is  concerned,  consequently,  we  may  as 
well  deal  with  the  simplified  structure  shown 
in  Figure  19  which  contains  a  single  source  of 
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Figure  19.    Schematic  representation  of  Wiener's 
prediction  theory  when  there  is  noise. 

flat  noise  and  a  single  shaping  network.  The 
transfer  admittance  of  the  shaping  network  N, 
is  determined  by  adding  the  power  spectra  of 
signal  and  noise,  converting  the  result  to  an 
amplitude  characteristic,  and  computing  the 
corresponding  minimum  phase  according  to 
^methods  already  used  for  the  noise-free 

Although  we  cannot  separate  the  signal  from 

d  Note  that  the  Bhaping  network  thu*  obtained  ia  not 
the  same  as  the  one  we  would  secure  by  adding  the 
transfer  admittances  of  N.  and  N,  in  Figure  18  di- 
rectly. In  order  to  realize  the  same  total  power  at  P, 
in  each  case,  it  is  necessary  to  begin  by  adding  the 
powers  rather  than  the  amplitude  characteristics  asso- 
ciated with  the  two  paths. 


CONFIDENTIAL 


96 


STEADY-STATE  ANALYSIS  OF  DATA  SMOOTHING 


the  noise  completely,  we  saw  earlier  that  the 
mean  square  difference  between  the  total  input 
and  the  signal  is  minimized  if  we  multiply  the 
amplitude  of  the  input  at  each  frequency  by 
the  ratio  of  the  signal  power  to  the  sum  of  the 
signal  and  noise  powers.  A  fictitious  filter 
having  the  prescribed  amplitude  characteristic 
is  represented  by  Nt  in  Figure  19.  We  assigned 
2V4  a  zero  phase  characteristic  so  that  there 
may  be  no  lag  in  producing  the  result  at  P,. 
Thus  the  output  at  Ps  at  any  instant  represents 
the  best  conceivable  estimate  (in  the  least 
squares  sense)  of  the  signal  at  that  instant. 
The  assumption  of  zero  phase,  of  course,  makes 
Ni  nonphysical,  since  it  must  have  at  least  the 
minimum  phase  characteristic  associated  with 
its  prescribed  amplitude  characteristic.  This, 
however,  is  not  an  objection  here  since  the 
structure  is  introduced  purely  for  purposes  of 
analysis. 

The  situation  is  now  reduced  to  a  form  in 
which  it  is  substantially  equivalent  to  the  one 
appearing  in  the  zero-noise  case.  Wi  assume  a 
series  of  random  impulses  at  P.,  which  would 
produce  responses  at  P,.  The  problem  is  that 
of  advancing  the  response  to  each  impulse  so 
that  the  same  result  appears  u  seconds  earlier 
at  terminal  P4.  The  solution  is  represented  by 
networks  2V,  and  N3,  which  discharge  functions 
similar  to  those  of  the  correspondingly  labeled 
networks  in  Figure  11.  Thus,  the  network  N2 
is  the  reciprocal  of  N,  and  is  provided  to  make 
terminal  P'2  equivalent  to  P„  as  a  source  of  im- 
pulses. Network  N3  is  defined  by  an  impulsive 
admittance  obtained  from  the  impulsive  admit- 
tance between  P,  and  P,  by  advancing  the 
latter  characteristic  a  units  in  time  and  then 
discarding  the  portion  at  negative  time. 

In  this  procedure  there  is  only  one  point  at 
which  the  situation  differs  from  that  without 
noise.  In  the  noise-free  case,  the  original  im- 
pulsive admittance  which  we  wished  to  advance 
in  time  was  identically  zero  at  negative  times. 
In  order  to  secure  a  physically  realizable  re- 
sult, we  needed  only  to  discard  the  portion  of  the 
impulsive  admittance  between  t  =  0  and  (  =  a. 
In  the  present  situation,  on  the  other  hand,  the 
impulsive  admittance  is  taken  from  a  path  in- 
cluding the  nonphysical  network  Nt.  Thus  the 
admittance  may  be  expected  to  take  such  form 
as  that  shown  in  Figure  20,  with  nonzero  am- 


plitudes at  both  negative  and  positive  times, 
and  in  order  to  secure  a  physical  final  network 
it  is  necessary  to  discard  everything  to  the  left 
of  the  line  a. 


Figure  20.    Typical  impulsive  admittance  of  best 
smoothing  network  Ni  in  Figure  19. 

This  difference  in  the  impulsive  admittance 
characteristics  has  two  consequences.  The  first 
is  the  fact  that  since  the  uncertainty  of  the 
prediction  is  measured  by  the  amount  of  im- 
pulsive admittance  which  must  be  discarded, 
it  is  evidently  greater  in  the  present  case  where 
we  are  discarding  much  more.  The  second  is 
the  fact  that  in  the  noise-free  case  uncertainty 
exists  only  for  a  positive  prediction  time.  A 
negative  prediction  time,  which  corresponds,  of 
course,  to  the  determination  of  the  value  as- 
sumed by  the  signal  at  some  time  in  the  past, 
can  be  set  into  the  analysis  as  easily  as  a  posi- 
tive prediction  time,  merely  by  shifting  the  im- 
pulsive admittance  to  the  right  rather  than  the 
left.  In  the  noise-free  case,  however,  there  is 
nothing  to  be  discarded  when  we  shift  to  the 
right,  since  the  impulsive  admittance  with 
which  we  begin  is  in  any  case  identically  zero 
for  negative  times.  Thus  the  uncertainty  in 
the  determination  of  any  past  value  of  the  sig- 
nal is  zero.  Since  we  have  postulated  no  noise 
to  confuse  the  data,  this  is,  of  course,  an 
inevitable  result.  As  soon  as  noise  is  included, 
on  the  other  hand,  there  is  no  such  sharp  dis- 
tinction between  the  future  and  the  past.e  The 
uncertainty  in  the  determination  of  the  true 
value  of  the  signal  in  the  near  past  is  almost 
as  great  as  it  is  in  estimating  what  the  signal 
will  be  in  the  near  future.  As  we  go  further 

*  This  statement  is  to  be  understood  in  a  physical 
rather  than  a  mathematical  sense.  It  is  not  intended 
to  imply  that  there  may  not  be  sharp  changes  of  be- 
havior in  the  impulsive  admittance  at  zero. 
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and  further  into  the  past  the  uncertainty 
gradually  diminishes.  If  we  can  allow  ourselves 
unlimited  lag,  we  at  length  reach  a  point  at 
which  the  discarded  portion  of  the  impulsive 
admittance  characteristic  is  negligibly  small. 
This,  however,  does  not  mean  that  all  uncer- 
tainties have  disappeared,  but  merely  that  we 
can  base  our  estimate  of  the  signal  upon  the 
power-ratio  rule  developed  previously. 

88      OVERALL  CHARACTERISTICS  OF 
PREDICTING  NETWORKS 


It  has  been  fairly  easy  to  develop  a  qualita 
tive  picture  of  the  general  characteristics  of 
typical  data-smoothing  networks.  As  we  have 
seen,  they  have  amplitude  characteristics  of  the 
low-pass  filter  type  combined  with  lagging 
phase  shifts.  No  corresponding  qualitative  pic- 
ture of  the  characteristics  of  a  typical  overall 
predicting  circuit  has,  however,  been  developed 
as  yet.  The  discussion  just  concluded  provides 
a  rule  for  determining  the  characteristics  of  a 
predicting  circuit  in  any  given  case,  but  pro- 
vides comparatively  little  in  the  nature  of  a 
description  of  the  result  we  may  expect  to 
secure. 

In  any  particular  situation  we  can,  of  course, 
calculate  the  overall  characteristics  of  the  pre- 
dicting circuit.  A  simpler  way  of  character- 
izing the  overall  predictor  characteristic  quali- 
tatively, however,  is  based  upon  the  use  of  the 
attenuation-phase  relations  for  physical  net- 
works. We  need  merely  use  such  an  equation 
as  (3)  backward.  Thus,  we  have  previously 
shown  that  a  positive  phase  slope  corresponds 
to  a  lagging  output.  Correspondingly,  a  nega- 
tive phase  slope  can  be  interpreted  to  repre- 
sent a  lead,  or  in  other  words,  a  prediction.' 


If  we  assign  (dB/di>)u  =  0  in  equation  (3)  a 
negative  value,  we  see  that  A-A0  must  on  the 
average  be  negative.  In  other  words,  the  am- 
plitude characteristic  of  an  overall  prediction 
circuit  must  rise,  on  the  average,  as  we  proceed 
upward  from  zero  frequency.  This  is  in  marked 
contrast  to  a  data-smoothing  network,  which, 
as  we  have  seen,  tends  to  have  a  low-pass  filter 
type  of  characteristic  with  a  falling  amplitude 
characteristic  at  high  frequencies.  The  in- 
creased amplitude  of  response  may  have  two 
detrimental  effects.  In  the  first  place,  it  evi- 
dently produces  a-  distorting  effect  on  any  sig- 
nal components  to  which  it  applies.  In  the 
second  place,  it  produces  an  exaggerated  re- 
sponse to  noise. 

Examples  of  the  characteristics  of  overall 
prediction  circuits  are  readily  constructed  by 
reference  to  the  circuit  of  Figure  21.  Various 


Figure  21.  One-dimensional  prediction  circuit 
with  data-smoothing  networks. 


'  This,  of  course,  does  not  mean  that  a  network  with 
a  negative  phase  slope  can  predict  a  perfectly  arbitrary 
event.  We  can  hope  to  realize  a  negative  phase  slope, 
in  combination  with  a  flat  amplitude  characteristic, 
over  only  a  finite  band.  The  spectrum  of  an  arbitrary 
event,  that  is,  any  suddenly  applied  signal,  will  always 
include  important  components  running  out  to  infinite 
frequency,  where  the  negative  phase  slope  can  no  longer 
be  realized.  The  statement  does,  however,  mean  that  if 
we  suddenly  apply  a  signal  made  up  of  one  or  more 
low-frequency  sinusoids,  and  wait  for  the  steady  state 
to  become  established,  the  output  will  appear  to  lead 
the  input  by  a  time  equal  to  the  slope  of  the  negative 
phase  characteristic. 


particular  results  are  obtained  by  assigning 
particular  characteristics  to  the  data-smooth- 
ing network.  Thus,  if  the  data-smoothing  net- 
work is  absent  entirely  the  transmission 
through  the  path  containing  the  differentiator 
is  u,tlt  since  differentiation  is  equivalent  to 
multiplication  by  i*>.  The  attenuation  of  the 
overall  circuit  is  consequently  A  =  —  log 
|1  +  imtf\.  This  is  plotted  as  curve  I  of  Figure 
22.  The  increasing  amplitude  characteristic  at 
high  frequencies  is  obviously  due  fundamen- 
tally to  the  increased  transmission  through  the 
differentiator  circuit. 

If  the  data-smoothing  network  is  assigned 
the  characteristic  (1  +  to**)-1,  corresponding  to 
a  very  simple  low-pass  filter  type  of  response, 
the  overall  transmission  becomes  that  shown 
by  curve  II  in  Figure  22.  (It  is  assumed  that 
a  =  t,,  for  simplicity.)  The  negative  attenuation 
at  high  frequencies  is  much  reduced.  This  is 
paid  for  by  an  increased  amplitude  of  response 
at  low  frequencies,  but  since  the  integration  in 
(3)  takes  place  on  an  inverse  frequency  scale, 
the  low-frequency  fragment  is  much  less  than 
the  gain  reduction  at  high  frequencies.  Curve 
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Ill  shows  the  result  whan  the  data-smoothing 
network  is  assigned  the  characteristic 
(1  +  um)  *.  Finally,  curve  IV  shows  the  result 
obtainable  when  there  is  also  a  After  in  the 
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Figure  22.  Attenuation  characteristics  of  predic- 
tion circuit  shown  in  Figure  21. 

present-position  circuit  (as  shown  by  the 
broken  lines  in  Figure  21),  so  that  there  may 
be  a  net  positive  attenuation  at  high  fre- 
quencies. 

In  view  of  the  inverse  frequency  scale  in  (3), 
the  gross  negative  attenuation  will  be  mini- 
mized if  the  negative  attenuation  region  is 
placed  very  close  to  zero  frequency.  This,  how- 
ever, means  that  much  of  the  signal  energy 
falls  in  the  negative  attenuation  region  so  that 
in  certain  respects,  at  least,  the  signal  response 
must  be  seriously  injured.  For  example,  in  the 
specific  circuits  just  discussed  we  can  place  the 
negative  attenuation  region  at  very  low  fre- 
quencies by  choosing  very  long  time  constants, 
a,  in  the  data-smoothing  networks,  with  the 
consequence  that  the  circuits  will  operate  cor- 
rectly for  any  long  continued  straight  line  path, 
but  will  be  very  sluggish  in  changing  from  one 
straight  line  to  another.  If  the  negative  attenu- 
ation region  is  placed  at  higher  frequencies,  on 
the  other  hand,  the  signal  response  is  improved 
but  beyond  certain  limits  the  circuit  becomes 
unbearably  sensitive  to  noise. 

Quantitative  illustrations  of  these  relation- 
ships are  quickly  constructed.  Suppose,  for  ex- 
ample, that  the  prediction  time  is  2  seconds. 
From  (3)  this  is  consistent  with  an  attenua- 


tion characteristic  having  zero  attenuation 
below  -  =  1  and  a  net  gain  of  *■  nepers  there- 
after. In  other  words,  the  amplitudes  of  all 
frequencies  below  «  =  1  are  increased  by  a  fac- 
tor of  about  22  to  1.  If  the  region  of  added 
gain  is  pushed  to  a  higher  frequency  or  con- 
centrated within  a  narrow  band,  the  multi- 
plying factor  rapidly  becomes  larger.  For  ex- 
ample, if  we  maintain  A  at  approximately  zero 
below  m  =  2,  the  average  gain  above  this  point 
must  be  2»  nepers,  corresponding  to  a  multi- 
plying factor  of  600  to  1.  We  secure  the  same 
factor  by  attempting  to  concentrate  the  region 
of  negative  attenuation  in  the  band  between 
m  =  1  and  m  =  2.  The  multiplying  factor  also 
goes  up  rapidly  as  we  increase  the  prediction 
time.  For  example,  with  the  gain  uniformly 
spread  over  the  frequency  region  above  «>  =  1 
the  multiplying  factor  is  500  for  a  prediction 
time  of  4  seconds,  or  more  than  10,000  for  a 
prediction  time  of  6  seconds. 

Reasonable  multiplying  factors  with  long 
prediction  times  can  be  obtained  only  by  carry- 
ing the  negative  attenuation  region  to  very  low 
frequencies.  As  indicated  previously,  the  cost 
of  this  is  an  increase  in  the  time  required  for 
the  signal  to  change  from  one  constant  or 
nearly  constant  value  to  another.  For  exam- 
ple, in  the  first  illustration  above,  if  the  region 
of  nepers  net  gain  is  carried  down  from 
o>  =  1  to  n  =  0.2  the  integral  in  (3)  is  just  five 
times  as  great  as  it  was  before,  so  that  the 
characteristic  corresponds  to  a  prediction  time 
of  10  rather  than  2  seconds.  This  change 
would  correspond  to  an  increase*  from  perhaps 
4  or  5  to  perhaps  20  or  25  seconds  in  the  time 
required  for  the  circuit  to  settle  from  one  con- 
stant value  to  another. 

Practical  examples  of  the  transmission  char- 
acteristics of  overall  prediction  circuits,  with 
particular  emphasis  on  the  dominant  effect  of 
even  very  small  negative  attenuations  at  ex- 
tremely low  frequencies,  are  shown  later  in 
Figures  5  to  8,  inclusive.  In  the  linear  predic- 
tor, A  -  A„  varies  as  —  ku>2  nears  zero,  and  it  is 
easily  seen  that  such  a  term  makes  a  finite  con- 


«  Only  rough  numbers  can  be  given,  since  circuits 
with  the  square-cornered  attenuation  characteristics 
chosen  for  illustrative  purposes  would  have  very  ripply 
transient  characteristics,  corresponding  to  no  very  well 
marked  settling  time. 
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tribution  to  the  integral  in  (3) .  On  the  other 
hand,  the  attenuation  of  the  quadratic  predic- 
tor, which  is  capable  of  dealing  exactly  with 
polynomial  functions  of  time  of  the  second 

degree  or  less,  is  necessarily  zero  at  the  origin" 

.  

v2*£f  JS£  of  Quasi-Distortionleas  Prediction 

Networks  in  Appendix  A. 


to  terms  of  the  order  of  «4,  so  that  the  integral 
in  this  region  can  be  neglected.  This  slight 
difference  between  the  two  characteristics  at 
frequencies  of  the  order  of  0.01  cycle  per 
second  and  below  is  sufficient  to  balance  the 
obviously  greater  negative  attenuation  of  the 
quadratic  predictor  at  higher  frequencies. 
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THE  ASSUMPTION  OF  ANALYTIC  ARCS 


THE  discussion  in  the  previous  two  chap- 
ters has  been  based  upon  the  assumption 
that  the  least  squares  criterion  forms  a  suita- 
ble measure  of  performance  for  a  predicting 
network.  This  assumption  permitted  us  to  re- 
strict our  attention  to  the  amplitude  spectra 
of  the  signal  and  .noise,  leaving  phase  relations 
entirely  out  of  account.  Thus,  both  signal  and 
noise  could  be  thought  of  as  "random  noise" 
functions  characterized  by  random  phases  and 
Gaussian  distributions,  as  described  in  the 
preceding  chapter.  So  far  as  the  noise  is  con- 
cerned, there  seems  to  be  nothing  wrong  with 
this  assumption.  In  the  case  of  the  signal,  how- 
ever, it  appears  that  significant  phase  relations 
may  exist.  This  chapter  will  consequently  set 
up  an  alternative  analysis  which  permits  the 
significance  of  possible  phase  relations  in  the 
target  paths  to  be  estimated. 

The  alternative  analysis  is  based  upon  the 
assumption  that  the  target  courses  are  sequen- 
ces of  analytic  segments  of  different  lengths 
joined  together.  These  segments  are  simple 
predictable  curves  such  as  straight  lines,  pa- 
rabolas, and  circles.  Significant  phase  relations 
are  implied  by  the  assumption  that  there  are 
sudden  changes  from  one  type  of  course  to 
another. 

This  picture  of  target  paths  is,  of  course, 
extreme.  There  are  no  such  sharp  discontinui- 
ties between  one  segment  and  another,  nor  do 
airplanes  fly  perfectly  along  simple  curves 
even  for  limited  periods.  Nevertheless,  it  is 
the  conception  of  target  courses  upon  which 
the  rest  of  our  analysis  is  based.  The  reasons 
for  believing  that  it  is  a  closer  approximation 
to  actual  target  courses  than,  say,  a  random 
noise  function  with  the  same  power  spectrum 
would  be,  are  given  later.  Perhaps  more  im- 
portant is  the  fact  that  the  possibility  of  hit- 
ting an  airplane  flying  along  such  a  simple 
analytic  arc  is  much  greater  than  it  would  be 
if  we  were  attempting  to  predict  a  correspond- 
ing random  noise  function.  It  is  thus  advan- 
tageous to  take  the  analytic  arc  assumption  as 
a  basis  for  designing  the  prediction  circuit, 


even  if  the  assumption  seems  to  be  reasonably 
well  justified  over  only  occasional  segments  of 
actual  target  paths.  An  example  of  such  a 
situation  is  furnished  by  the  bombing  run 
illustration  described  in  Chapter  7. 

As  a  corallary  to  the  analytic  arc  assump- 
tion it  is  also  assumed  that  the  theoretical 
predicted  point  must  be  quite  close  to  the  actual 
target  position  if  the  probability  of  scoring  a 
hit  is  to  be  appreciable.  In  other  words,  such 
dispersive  factors  as  random  errors  in  com- 
puter or  gun  or  the  lethal  radius  of  the  shell, 
which  would  tend  to  produce  occasional  hits  at 
long  distances  from  the  theoretical  predicted 
point,  are  quite  small.  This  is  such  a  plausible 
assumption  in  the  light  of  present-day  antiair- 
craft experience  that  its  critical  importance  in 
the  present  argument  is  likely  to  go  unper- 
ceived.  However,  this  is  the  assumption  which 
limits  consideration  to  small  errors  in  predic- 
tion, whereas  the  least  squares  criterion  natu- 
rally gives  greatest  emphasis  to  large  errors. 
If,  for  example,  antiaircraft  projectiles  were 
suddenly  endowed  with  a  much  greater  de- 
structive radius,  we  would  be  much  more  in- 
terested in  fairly  large  misses,  and  the  objec- 
tions to  the  least  squares  criterion  would  disap- 
pear. 

These  postulates  are  discussed  in  more  detail 
in  the  following  sections.  In  anticipation  of 
this  discussion  the  following  conclusions  may 
be  mentioned: 

1.  With  the  assumptions  as  stated,  the  pre- 
diction should  be  on  a  modal  rather  than  a 
least  squares  basis.  In  other  words,  the  gun 
should  be  aimed  at  the  most  probable  future 
position  of  the  target. 

2.  Modal  prediction  requires  evaluation  of 
the  parameters  of  the  analytic  arc  the  target 
is  at  present  traversing.  This  can  be  accom- 
plished by  smoothing  the  values  of  these  pa- 
rameters evaluated  for  a  period  in  the  past. 

3.  If  the  smoothing  is  performed  by  linear 
invariable  networks,  the  impulsive  admittances 
of  these  networks  should  have  a  definite  cutoff 
after  a  finite  smoothing  time.  By  this  means 
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all  data  over  a  certain  age  are  given  zero  weight. 
The  method  of  calculating  the  proper  smooth- 
ing time  is  developed. 

4.  Definite  advantages  can  be  obtained  from 
circuits  with  variable  smoothing  times  if  such 
systems  can  be  satisfactorily  mechanized. 

THE  TARGET  COURSES 

The  target  courses,  like  the  tracking  errors, 
can  be  thought  of  as  a  statistically  generated 
set  of  functions — that  is,  a  stochastic  process. 
The  structure  of  this  process  is,  however,  very 
different  from  that  of  the  tracking  errors.  It 
is  by  no.  means  satisfactory  to  assume  the 
target  courses  to  be  equivalent  to  a  random 
noise  having  the  same  power  spectrum  as  the 
target  courses.  As  we  pointed  out  in  Chapter 
7,  the  target  is  piloted  by  a  purposeful  human 
being.  It  tends  to  follow  a  definite  simple  curve 
for  a  period  of  time  and  then  to  shift  to  a  new 
simple  curve.  Much  of  the  flight  is  in  attempted 
straight  lines  with  constant  velocity.  Most  of 
the  remainder  can  be  considered  to  be  segments 
of  circles  or  helices  in  space,  or  as  segments  of 
parabolas  or  higher  degree  curves.  Straight 
line  constant  speed  flight  corresponds  to  the 
airplane  controls  in  a  neutral  position.  The 
helical  flight  is  a  natural  generalization  allow- 
ing arbitrary,  but  fixed,  positions  of  the  con- 
trols. The  curves  which  are  parabolic  functions 
of  time  correspond  to  constant  acceleration  in 
the  three  space  coordinates.   Thus,  all  these 
assumptions  have  a  reasonable  physical  back- 
ground. 

Most  antiaircraft  computers  are  constructed 
on  the  assumption  of  straight  line  flight,  al- 
though some  work  has  been  done  in  World 
War  II  on  curved  flight  directors  both  with  the 
helical  and  the  parabolic  assumptions.  There  is 
not  a  great  deal  of  difference  in  these  two 
generalizations  from  the  practical  point  of 
view,  since  determination  of  acceleration  terms 
is  subject  to  such  large  errors  in  any  case. 

The  important  part  of  this  representation 
of  the  target  courses  is  that  they  consist  of 
segments  of  simple  analytic  curves  joined  to- 
gether. The  individual  segments  are  completely 
predictable  if  we  have  a  part  of  the  segment 
given  exactly.  One  need  merely  evaluate  the 
parameters  of  the  segment  from  the  given  part 


and  evaluate  the  curve  for  t  -  tf.  The  unpre- 
dictable part  of  the  target  courses  is  due  to  the 
possibility  of  sudden  changes  from  one  segment 
to  another.  With  random  noise  functions  the 
unpredictableness  occurs  continuously. 

This  simplified  description  of  the  target 
courses  as  piecewise  analytic  functions  must 
be  recognized  as  only  a  first  approximation.  A 
more  complete  description  of  the  target  course 
would  include  the  "fine  structure,"  the  con- 
necting curves  between  the  various  analytic 
segments  and  the  deviations  from  the  segments 
due  to  random  air  disturbances  and  similar 
causes.  This  latter  effect,  the  wandering  of  the 
target  from  its  intended  path,  might  be  reason- 
ably well  represented  by  the  addition  of  a 
random  noise  function  to  the  piecewise  analytic 
functions  described  above. 

M      THE  POISSON  DISTRIBUTION  OF 
SEGMENT  END  POINTS 

The  analytic  segments  of  which  the  course 
is  supposed  to  consist  are  not  all  of  the  same 
duration  —  we  may  assume  some  probability 
distribution  of  the  duration  of  these  segments. 
The  simplest  assumption  here  is  that  the 
breaks  occur  in  a  Poisson  distribution  in  time. 
This  assumption  is  not  necessary  for  our 
analysis  but  is  a  reasonable  one  and  leads  to 
a  simple  mathematical  treatment.  Any  other 
reasonable  distribution  would  give  comparable 
results. 

A  series  of  events  is  said  to  occur  in  a 
Poisson  distribution  in  time  if  the  periods  be- 
tween successive  events  are  independent  in  the 
probability  sense  and  are  controlled  by  a  distri- 
bution function 

p(l)dl  =  -  e-"«  dl . 
a 

Here  p(l)dl  is  the  probability  of  an  interval  of 
length  between  I  and  I  +  dl.  This  means  that 
the  frequency  of  intervals  of  a  given  length  is 
a  decreasing  exponential  function  of  the  length. 
This  type  of  distribution  is  familiar  in  physics 
as  describing  the  decay  of  radioactive  sub- 
stances. The  time  a  in  the  distribution  function 
is  the  average  length  of  the  intervals,  since 

a> 
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-  e-'/a  dl 
'o  ° 

=  a  . 

It  is  related  to  the  "half  life"  6  of  the  interval 
by 

b  =  a  In  2  . 

The  single  number  a  completely  specifies  the 
Poisson  distribution.  The  events  may  be  said 
to  be  happening  as  randomly  as  possible  apart 
from  the  fact  that  they  occur  at  an  average 
rate  of  1/a  per  second. 

Another  way  of  describing  a  Poisson  distri- 
bution of  events  is  the  following.  The  probabil- 
ity of  an  event  in  a  small  interval  of  duration 
dl  is  (l/a)dl  and  is  independent  of  whether  or 
not  events  have  occurred  in  any  other  nonover- 
lapping  intervals. 


IBUTION 

S 


Let  us  suppose  that  we  have  a  record  of  the 
course  of  the  target  up  to  the  present  time  and 
a  complete  statistical  description  of  the  set  of 
target  courses.  What  can  then  be  said  about  the 
position  of  the  target  tt  seconds  from  now?  If 
we  were  able  to  analyze  the  data  completely 
the  most  we  could  obtain  would  be  a  probability 
distribution  function  for  the  future  position. 
This  distribution  function  would  give  the  prob- 
ability, in  the  light  of  the  course  history,  of 
the  target  being  at  any  point  in  space  at  the 
future  time.  This  function  would  assume  large 
values  at  likely  points  and  low  values  at  un- 
likely points.  For  t,  small  the  distribution 
would  be  highly  concentrated  and  for  larger  lt 
it  would  tend  to  spread  out. 

In  the  simple  case  we  have  been  discussing, 
of  a  Poisson  distribution  of  sudden  changes  in 
type  of  course,  the  distribution  consists  of  two 
parts.  First,  there  is  a  spike  of  probability  at 
one  point,  the  continuation  of  the  present  pre- 
dictable segment.  Second,  there  is  a  continuous 
distribution  which  corresponds  to  possible 
changes  to  a  new  segment  during  the  time  of 
flight.  As  t,  increases  the  total  probability  in 
the  spike  decreases  exponentially  toward  zero, 
and  the  total  in  the  continuous  part  increases 
exponentially  toward  unity.  The  behavior  is 
roughly  as  indicated  in  Figure  1. 


i 

i 

i 

3-2-1  ( 

)         1         2  3 

Figure  1. 
sition  of 
courses. 


Probability  distribution  of  future  po- 
target,   assuming   piecewise  analytic 


A  very  different  type  of  future  position  dis- 
tribution is  exhibited  with  other  assumptions 
about  the  target  courses.  For  example,  suppose 
the  courses  were  random  noise  functions  with 
the  power  spectrum 

P^  =  ^Ar-,  • 

fl2  +  0)2 

A  typical  noise  function  with  this  spectrum  is 
shown  in  Figure  2.  In  Figure  3  is  shown  a 
typical  velocity  under  the  other  assumption, 
that  the  courses  are  piecewise  analytic  and  in 
fact  straight  lines  between  breaks.  If  the 
breaks  are  Poisson  distributed,  both  Figure  2 
and  Figure  3  have  the  same  power  spectrum, 
l/(a2  +  a.2).  The  future  distribution  of  veloci- 
ties for  Figure  3  is  shown  in  Figure  1,  and  for 
Figure  2,  it  will  be  as  shown  in  Figure  4.  In  the 
random  noise  case  the  future  distribution  is  a 
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Gaussian  distribution  with  no  spike.  The  center 
of  this  distribution  decreases  exponentially  to- 
ward zero  with  increasing  time  of  flight  ac- 
cording to  the  formula 


Xtj  =  A'o  e  "f 

where  X0  is  the  present  value  of  the  function 
and  X.,  is  the  mean  of  the  future  distribution. 


*t  t 

 1 



— ,  1 

Figure  2.    Typical  noise  function. 

The  standard  deviation  <r  of  the  distribution  in- 
creases exponentially  toward  the  rms  value  of 
the  function  according  to 


u  =  A(l  -  e-*"/). 

Supposing  that  this  distribution  function 
could  be  determined,  where  should  the  gun  be 
aimed?  The  answer  to  this  will  depend  on  two 
factors:  the  gun  dispersion,  and  the  lethal 


o 
o 
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Figure  3.    Typical  velocity  function. 


effects  of  the  shell.  If  the  gun  is  aimed  to 
explode  the  shell  at  a  certain  point  in  space, 
the  shell  will  not  necessarily  explode  at  that 
point,  but  rather  there  will  be  a  distribution  of 
positions  centered  about  the  point  aimed  at, 
because  of  gun  dispersion.  Also,  if  the  shell 
explodes  at  a  certain  point  and  the  target  is  at 


another  point,  there  will  be  a  certain  proba- 
bility of  lethal  effect  which  decreases  rapidly 
with  increasing  distance  between  the  points. 
These  two  functions  could  be  combined  by  a 
product  integration  to  give  the  probability  of 
t  if  the  target  is  at  one  point  and 


1 

1 
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Figure  4.  Probability  distribution  of  future  posi- 
tion of  target,  assuming  courses  with  random 
noise  properties. 


the  gun  aimed  to  explode  the  shell  at  a  second 
point.  To  determine  the  probability  of  a  hit 
when  aiming  at  a  certain  point,  then,  we  should 
multiply  the  probability  of  the  target  being  at 
each  point  in  space  by  the  probability  of  lethal 
effect  when  it  is  at  that  point  and  integrate  the 
product  over  all  space.  The  optimum  point  of 
aim  will  be  the  one  which  maximizes  this  in- 
tegrated product. 

In  one  dimension  this  may  be  expressed 
mathematically  as  follows.  Let  P(x)  be  the 
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future  position  distribution  of  the  target,  so 
that  P(x)dx  is  the  probability  of  it  being  in 
the  interval  from  x  to  x  +  dx  at  the  future  time. 
Let  Q(x,y)  be  the  probability  of  hitting  the 
target  if  the  gun  is  aimed  at  point  y  and  the 
target  is  at  point  x.  Then  the  total  probability 
of  a  hit  when  aiming  at  point  y  is 


H(y) 


I 


P{x)  Q(x,y\  dx  . 


The  point  of  aim  y  should  be  chosen  to  maxi- 
mize R(y). 

In  the  cases  we  consider,  the  lethal  radius  of 
the  shell  and  the  dispersion  of  the  gun  are  both 
assumed  to  be  small  in  comparison  with  the 
range  of  future  positions  if  there  is  a  change 
of  course  during  the  time  of  flight.  This  means 
that  Q(x,y)  is  small  unless  x  is  xery  near  to  y. 
Q(x,y)  can  be,  in  fact,  considered  to  be  a  8 
function  of  (x-y),  and  the  value  R(y)  is  then 
just  a  constant  times  P(y).  Thus,  the  best 
aiming  point  under  this  assumption  is  the  most 
probable  future  position  of  the  target.  The  as- 
sumption of  small  lethal  distance  is  generally 
valid  with  antiaircraft  fire  and  ordinary  chemi- 
cal explosive  shells. 

Now  the  most  probable  future  position  in  our 
case  is  the  spike  of  probability  corresponding 
to  the  analytic  extrapolation  of  the  present  seg- 
ment of  the  target  course.  To  determine  its 
position  one  must  find  the  parameters  of  this 
segment  and  evaluate  for  t,  seconds  in  the 
future.  For  example,  if  the  segments  are  as- 
sumed to  be  straight  lines  (constant  velocity 
target)  the  velocity  components  are  determined 
and  multiplied  by  t,  to  give  the  predicted 
change  in  position.  These  changes  are  added  to 
the  present  position  to  give  the  future  position. 
If  helical  or  parabolic  segments  are  assumed, 
the  parameters  of  these  curves  are  determined 
from  the  past  data,  and  the  curves  extrapo- 
lated t,  seconds  into  the  future. 

These  conclusions  may  be  contrasted  with 
the  idea  of  aiming  at  the  point  which  mini- 
mizes the  mean  square  error.  The  least  squares 
criterion  amounts  to  aiming  at  the  mean  or 
center  of  gravity  of  the  future  distribution  of 
position.  This  point  will  ordinarily  be  under 
the  continuous  part  of  the  distribution  and  not 
at  the  spike;  e.g.,  the  point  marked  in  Figure  1. 
Its  position  depends  to  a  considerable  extent  on 


distant  parts  of  the  distribution,  which  would 
surely  bo  complete  misses  in  any  case.  The 
chief  advanta.:;  .  the  least  squares  criterion 
is  that  it  fits  in  well  with  the  mathematical 
tools  suitable  to  these  problems,  leading  to 
solvable  equations. 

The  least  squarns  <  nterion  will  still  appear 
in  our  analysis  in  rKat  we  attempt  to  smooth 
our  course  param>:t. ra  in  such  a  way  as  to 
minimize  the  mean  square  error  in  these,  a 
very  different  thinp  fr  m  minimizing  the  mean 
square  error  in  th*    redicted  position  of  the 


••*     \ECES<]  I  V  OK  A  SHARP  CUTOFF 

The  changes  in  the  course  parameters  be- 
tween-adjacent segments  can  be  very  large. 
Also,  at  the  start  of  operations  and  in  changing 
from  one  target  to  another  there  will  be  large 
and  erratic  variation  of  the  input  to  the 
smoothing  and  predicting  circuits,  unrelated  to 
the  present  target  course.  If  any  of  these  data 
are  used  in  prediction,  the  result  will  almost 
surely  be  a  miss  because  of  the  small  lethal 
radius  of  the  shell.  The  only  way  to  eliminate 
these  errors  in  a  linear  invariable  system  is  to 
have  all  weighting  functions  cut  off  sharply 
after  a  short  time.  Then  ail  data  over  a  certain 
age  are  eliminated.  Hits  will  occur  only  when 
the  target  has  been  on  a  predictable  segment  for 
this  length  of  time  or  more  and  remains  there 
at  least  t,  seconds  in  the  future. 

Suppose  the  weighting  function  for  velocity 
has  a  1  per  cent  tail  beyond  the  cutoff  point 
and  that  the  trackers  start  following  the  target 
from  a  zero  position.  Then  after  the  smoothing 
time  there  will  be,  because  of  the  lack  of  exact 
cutoff,  a  1  per  cent  error  in  velocity.  If  the 
time  of  flight  were  15  seconds  and  the  target 
velocity  200  yards  per  second,  this  represents 
an  error  of  W  yards  in  predicted  position. 
Since  this  is  comparable  to  the  other  errors  in 
a  typical  director,  we  conclude  that  the  tail  of 
the  smoothing  curve  should  not  be  much  greater 
than  1  per  cent  of  its  total  area. 

95        CALCULATION  OF  THE  BEST 
SMOOTHING  TIME 
Under  the  assumptions  we  have  made,  the 
proper  smoothing  time  to  maximize  the  number 
of  hits  can  be  determined  as  follows.  Let  P(l) 
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be  the  probability  that  a  predictable  segment 
of  the  course  lasts  for  I  seconds  or  more.  In 
the  Poisson  case  this  function  is 

P(l)  =  e-'/a 

With  a  given  smoothing  time  S  there  will  be  a 
certain  probability  of  hitting  the  target,  as- 
suming it  has  been  on  the  present  segment  for 
S  seconds  in  the  past  and  will  remain  there  for 
tf  seconds  in  the  future.  We  assume  changes 
in  course  to  be  so  large  that  any  change  re- 
sults in  a  miss.  This  probability  of  a  hit  Q(S), 
provided  it  remains  on  the  course,  will  be  an 
increasing  function  of  S.  Ordinarily  the  stand- 
ard deviation  will  decrease  as  the  square  root 
of  the  smoothing  time.  We  have  assumed  the 
lethal  radius  of  the  shell  small  compared  to  the 
dispersion  of  shells  about  the  target.  The  prob- 
ability of  a  hit  will  then  vary  inversely  with 
the  volume  through  which  the  shells  are  dis- 
persed. If  the  gun  itself  had  no  dispersion  but 
all  errors  were  due  to  tracking  errors  (and  if 
the  tracking  error  spectrum  is  flat),  the  prob- 
ability of  a  hit  would  then  vary  as  KS*f*  for 
S  in  the  region  of  interest.  This  is  because 
there  are  three  dimensions  and  the  expected 
error  in  each  of  these  is  decreasing  as  S~1/2. 
With  gun  dispersion  present,  Q(S)  will  have 
the  form 


w>-*(.?+.ij) 


-3/2 


where  a,  is  the  standard  deviation  due  to  the 
gun  dispersion,  and  a2y/a/S  that  due  to  track- 
ing errors.  The  sum  of  the  squares  is  the  total 
variance  in  each  dimension  and  the  three- 
halves  power  gives  the  total  dispersion  volume. 

When  these  two  functions  P(l)  and  Q(S) 
are  known,  the  best  smoothing  time  is  that 
which  minimizes  the  product 

P(S  +  tf)  ■  Q(S)  . 

The  first  term  is  the  probability  of  a  predict- 
able segment  of  the  course  lasting  S  -+-  tf  sec- 
onds, and  the  second  term  is  the  probability  of 
a  hit  if  it  does  last  that  long.  Therefore,  the 
product  is  the  probability  of  a  hit  with  smooth- 
ing time  S. 

In  the  Poisson  case,  with  no  gun  dispersion, 
the  calculation  is  as  follows : 

P(l)  =  e 


s  + 1, 


P(S  +  tf)  =  e~~  =  Ae 

Q(S)  =  .S« 
f(S)  =  P(S  +  t,)Q(S)  =  Be~*'° 


■S/a 


f'(S)  =b[< 


-S/a  3  ^1/2  _  l^-S/o^S/! 


S  =  la 
2 


The  proper  smoothing  time  is  %  of  the  aver- 
age segment  length,  and  is  independent  of  the 
time  of  flight  and  all  other  factors. 

The  presence  of  gun  dispersion  and  computer 
errors  which  are  independent  of  smoothing 
time  decreases  the  best  S  from  this  value.  In 
this  case  the  equation  for  optimal  S  is  the 
quadratic 


,    2S     3  a 


0; 


hence 

S 

—  = 
a 

= 


-4  +  a^/c\  +  6<r« 
2,? 


Here  n,  is  the  part  of  the  errors  which  is  in- 
dependent of  smoothing  time  (dispersion 
errors  in  the  computer,  etc.)  and  at  is  the  error 
which  varies  inversely  with  the  square  root  of 
S,  a,  being  its  value  at  S  =  a.  Ordinarily  ^  is 
several  times  a.,  in  which  case  we  have  approxi- 
mately 


~*  ~a~  o\ 


ffi  Is 

«Tl\2 


There  are  other  factors  which  we  have  neg- 
lected, which  decrease  the  best  smoothing  time 
still  further.  The  wandering  of  the  target  about 
the  predictable  segments  assumed  in  the  above 
simplified  analysis  makes  old  data  less  reliable 
and  therefore  reduces  S.  Also,  there  is  the  tac- 
tical consideration  that  when  starting  to  track 
a  target  it  is  desirable  to  commence  firing  as 
soon  as  possible,  even  if  reducing  this  time 
makes  individual  hits  somewhat  less  probable. 
For  these  and  other  reasons  the  best  smooth- 
ing time  will  be  just  a  fraction  of  a. 
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94         NONLINEAR  AND  VARIABLE 
SYSTEMS 

The  compromise  required  in  choosing  a  cer- 
tain definite  smoothing  time  can  be  eliminated 
by  the  use  of  nonlinear  elements.  In  particular, 
if  a  method  is  devised  for  determining  when 
changes  of  course  occur,  this  indication  can  be 
used  to  start  a  new  linear  but  variable  smooth- 
ing operation,  so  that  the  device  uses  all  the 
data  pertinent  to  the  present  segment  and  no 
data  from  previous  segments.  There  is  a  clear 
improvement  in  such  cases  although  not  so 
great  as  might  be  expected.  There  are  many 
practical  difficulties  in  proper  adjustment  of 
such  a  "trigger"  action.  If  the  trigger  is  too 
sensitive  it  will  assume  new  segments  due 
merely  to  tracking  noise  and  seldom  allow  suffi- 
cient smoothing  for  accurate  fire.  If  it  is  too 
insensitive  it  fails  in  its  function  of  quickly 


locating  changes  of  segment.  Since  the  noise 
and  target  courses  are  subject  to  considerable 
variation,  this  aujustment  is  not  easy. 

In  such  a  system  the  smoothing  may  be 
linear — the  only  nonlinearity  is  the  tripping 
circuit.  The  analysis  of  best  weighting  func- 
tions, etc.,  given  in  later  chapters  can  for  the 
most  part  be  applied  to  such  cases.  There  may 
also  be  advantages  to  be  derived  from  making 
the  smoothing  operator  depend  on  the  general 
position  in  space  of  the  target  relative  to  the 
gun.  The  smoothing  time  may  be  varied,  for 
example,  as  a  function  of  the  time  of  flight. 
This  type  of  variation  would  be  slow  compared 
to  the  noise  frequency,  and  here  again  the 
linear  analysis  can  be  used. 

Whether  any  real  advantage  can  be  obtained 
by  "strongly"  nonlinear  smoothing  in  practical 
cases  other  than  these  two  possibilities  is  ques- 
tionable. 
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The  analytic  arc  assumption  described  in 
the  previous  chapter  immediately  allows  us 
to  reduce  a  vast  proportion  of  data-smoothing 
problems  to  a  relatively  conci'ete  form.  Obvi- 
ously the  arc  will  be  specified  by  a  number  of 
parameters  and  the  principal  object  of  the  com- 
puting and  data-smoothing  circuits  must  be  to 
isolate  values  of  these  parameters  on  the  basis 
of  which  a  prediction  can  be  made.  In  practi- 
cal cases  the  instantaneous  values  of  the 
parameters  are  isolated  by  coordinate  con- 
verters. The  function  of  the  data-smoothing 
circuit  is  to  provide  a  suitable  average  from 
these  instantaneous  values.  This  is  called 
"smoothing  a  constant''  here  since  the  param- 
eters are  assumed  to  be  constant  along  each 
arc,  although  they  may  change  radically  from 
one  arc  to  another. 

The  data-smoothing  network  is  most  con- 
veniently specified  by  its  impulsive  admittance. 
(See  Appendix  A.)  In  accordance  with  the 
assumptions  made  in  the  previous  chapter,  it 
will  be  assumed  that  the  desired  impulsive  ad- 
mittance is  identically  zero  after  some  limiting 
time  T.  Thus,  T  seconds  after  a  change  from 
one  analytic  arc  to  the  next  the  new  parameter 
value  is  established.  T  is  the  so-called  "settling 
time"  of  the  data-smoothing  network. 

With  the  settling  time  limit  given,  the  prob- 
lem of  choosing  a  suitable  data-smoothing  net- 
work reduces  to  that  of  finding  the  best  shape 
of  the  impulsive  admittance  characteristic  for 
t  <  T.  Obviously  this  shape  determines  how 
the  output  of  the  network  changes  in  going 
from  the  parameter  value  appropriate  for  the 
first  arc  to  that  appropriate  for  the  second.  The 
exact  way  in  which  the  response  settles  from 
one  constant  value  to  the  next  is,  however, 
usually  of  comparatively  little  interest.  The 
shape  of  the  weighting  function  is  of  impor- 
tance chiefly  because  of  its  effect  on  the  noise. 
For  each  noise  spectrum  there  is,  in  principle, 
an  optimum  shape  for  the  weighting  function. 
The  present  chapter  approaches  the  problem  of 
choosing  a  shape  which  will  minimize  the  effect 
of  noise  from  several  points  of  view. 


It  should  be  noted  that  the  term  noise  as  used 
here  does  not  necessarily  refer  to  the  errors 
associated  directly  with  the  tracking  data.  The 
tracking  data  may  have  been  subjected  to  co- 
ordinate conversions,  differentiations,  or  other 
processes  of  computation  before  reaching  the 
data-smoothing  network."  The  noise  associated 
with  the  signal  to  be  smoothed  thus  will  usually 
have  characteristics  differing  from  those  of  the 
noise  associated  with  the  tracking  data. 

10  1         EXPONENTIAL  SMOOTHING 

Before  attacking  the  problem  of  smoothing  a 
constant  in  a  systematic  way  it  is  worth  while 
to  consider  an  important  special  case.  This  is 
the  so-called  exponential  smoothing  circuit.  It 
leads  to  a  data-smoothing  network  in  which 
the  output  V  is  related  to  the  input  E  by 


V(t) 


r)  dr 


so  that  the  impulsive  admittance  W(t)  is  an 
exponential  function  of  time,  as  illustrated  by 
Figure  1. 


-2         0  2         4  6 

Figure  1.    Simple  exponential  weighting  function. 

An  impulsive  admittance  of  the  type  shown 
in  Figure  1  does  not  show  any  very  definite 
settling  time.  The  exponential  curve  ap- 
proaches zero  gradually,  and  it  is  a  long  time 
after  a  change  in  course  before  the  effects  of 
the  data  obtained  on  the  old  course  are  negli- 
gible. This  is  obviously  an  undesirable  result, 

1  In  exceptional  circumstances  the  physical  apparatus 
in  which  these  processes  are  carried  out  may  also  be 
sources  of  additional  noise. 
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and  the  exponential  weighting  function  is  con- 
sequently not  a  recommended  one  for  situations 
to  which  the  analytic  arc  assumption  applies. 
The  exponential  solution  is,  however,  described 
here  because  it  occurs  in  such  a  vast  variety  of 
cases.  It  is  found,  in  fact,  whenever  the  data- 
smoothing  device  is  specified  by  a  linear  first- 
order  differential  equation  with  constant  coeffi- 
cients. It  may  thus  correspond  to  many  simple 
situations.  For  example,  this  is  the  result 
which  would  be  obtained  in  an  electrical  circuit 
if  we  smoothed  the  data  by  placing  a  simple 
shunt  capacity  across  a  resistance  circuit.  In 
mechanical  structures  it  is  encountered  when- 
ever the  damping  depends  either  upon  simple 
inertia  or  a  simple  compliance. 

Simple  exponential  smoothing  also  occurs  in 
a  variety  of  other  situations  which  may  be 
somewhat  less  obvious.  For  example,  it  is  the 
effective  result  in  either  an  aided  laying  or  a 
regenerative  tracking  scheme  whenever  the 
ratio  between  rate  and  displacement  correc- 
tions is  fixed.  Another  somewhat  similar  ex- 
ample is  furnished  by  the  feedback  amplifier 
circuit  shown  in  Figure  2.  Since  rapid  fluctua- 


Figurx  2.  Feedback  amplifier  circuit  giving  simple 
exponential  weighting  function. 


tions  in  the  output  of  this  amplifier  are  fed 
back  through  the  capacity  and  tend  to  oppose 
the  input  voltage,  the  structure  acts  as  a 
smoother,  and  more  detailed  analysis  would 
show  that  it  has  characteristics  similar  to  those 
obtained  by  using  a  shunt  capacity  across  a 
resistance  circuit.  The  structure  is  introduced 
here  because  considerable  use  is  made  of  it  in 
connection  with  the  discussion  of  nonlinear 
smoothing  in  a  later  chapter. 

One  simple  conclusion  about  data-smoothing 
networks  can  be  drawn  immediately  from  this 
discussion.  Since  all  structures  simple  enough 
to  be  specified  by  a  first-order  differential  equa- 


tion give  exponential  smoothing,  which  has  no 
very  well-marked  settling  time,  it  is  clear  that 
a  data-smoothing  network  which  shows  a  well- 
defined  settling  time  must  probably  be  at  least 
moderately  complicated. 

»°»  CURVE-FITTING  METHOD 

Consider  the  signal  E  shown  in  Figure  3 
under  the  assumption  that  the  true  signal  is 
constant  and  the  superposed  noise  is  random 


t-T  t 
Figure  3.    Piecewise  constant  signal  with  noise. 

with  a  flat  spectrum.  The  best  constant  A,  in 
the  least  squares  sense,  which  can  be  fitted  to 
the  signal  from  t  -  T  to  Ms  that  which  mini- 
mizes 


Jt-i 


[A  -  E(X)]3  d\  , 


viz., 


ff-T 


E(K)  . 


(1) 


Comparing  this  with  equation  (2),  Appendix 
A,  it  will  be  seen  that  A,  which  is  obviously  a 
function  of  t,  is  the  response  to  the  assumed 
signal  of  a  network  whose  impulsive  admit- 
tance is 


W(t) 


1 
T 


0  <  t  <  T 


(2) 


This  is  the  best  weighting  function  for  smooth- 
ing under  the  assumed  circumstances.  It  is 
illustrated  in  Figure  4. 

A  more  complex  situation  is  one  in  which  the 
true  signal  is  a  line  of  constant  slope  with 


mu 


T 

JL 
T 


Figure  4.  Best  weighting  function  for  smoothing 
piecewise  constant  signal. 
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superposed  flat  random  noise,  as  shown  in  Fig- 
ure 5.  For  convenience  the  analysis  will  be 
conducted  in  terms  of  the  age  variable  r  »  t  -  \, 


t-T  t 
Figure  5.    Piec^wise  linearly  varying  signal  with 


noise. 


The  best  straight  lint'  A  —  Br  which  can  be  fit- 
ted to  the  signal  from  r  =  0  to  t  =  T  is  that 
which  minimizes 

£T[A-Br-E{t-r)  Vdr. 

Hence  A  and  B  must  satisfy  simultaneously 

t     t*     i  rT 

Eliminating  A,  we  get 
whence  by  partial  integration 


(3) 


B 


t)  •  t(T  -  r)  dr 


Comparing  this  with  (7),  Appendix  A,  it  will 
be  seen  that  B,  which  is  obviously  a  function  of 
t,  is  the  response  to  the  derivative  of  the  as- 
sumed signal  of  a  network  whose  impulsive 
admittance  is 


W(t) 


f'  fV'f)  0<t<T 


(4) 


This  is  the  best  weighting  function  for  smooth- 
ing the  derivative  of  the  signal  under  the  as- 
sumed circumstances.  It  is  illustrated  in  Fig- 
ure 6  and  is  generally  referred  to  as  the  "para- 
bolic weighting  function." 


It  should  be  noted  also  that  the  right-hand 
member  of  the  first  of  equations  (3)  is  form- 
ally the  same  as  that  of  equation  (1).  Hence 
the  response  of  the  network  specified  by  (2) 


0  T 

Figure  6.    Best  weighting  function  for  smoothing 
piecewise  linearly  varying  signal. 

and  illustrated  in  Figure  4,  to  the  type  of 
signal  shown  in  Figure  5,  will  correspond  to 
the  value  on  the  best  straight  line  T/2  seconds 
back  from  t,  the  present  time.  This  network  is 
still  the  best  for  smoothing  the  signal,  but  it 
introduces  a  delay  of  one  half  of  the  smooth- 
ing time.  The  delay  may  be  reduced  only  at 
the  price  of  a  reduction  in  smoothing  unless  the 
smoothing  time  is  increased. 

AUTOCORRELATION  METHOD 

The  autocorrelation  method  with  finite  set- 
tling time  was  first  used  by  G.  R.  Stibitz  in 
numerical  determination  of  the  best  weighting 
function  for  smoothing  the  derivative  of  track- 
ing data  with  typical  tracking  errors.  This 
method  was  also  used  to  determine  the  sensitiv- 
ity of  smoothing  to  departures  of  the  weighting 
function  from  the  best  form. 

The  analysis  is  based  up 


V{t) 


r)    W(r)  dr    t>  T 


for  the  response  to  the  derivative  of  the  error 
time  function  g(t)  of  a  network  whose  impul- 
sive admittance  or  weighting  function  W(t)  is 
identically  zero  for  t  >  T  as  well  as  for  t  <  0. 
Since  measured  tracking  errors  are  generally 
tabulated  only  at  1-second  intervals,  the  in- 
tegral may  be  approximated  by  the  sum 


- 1 


m+Oi) 


m-(H) 


for  integral  values  of  t. 

The  instantaneous  transmitted  power  is  the 
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square  of  this  expression,  and  the  average 
transmitted  power  is 

P.v,  =  hill   J.  V  yttt\ 

*  , To 

This  may  be  expressed  in  the  form 

^•.=  LLWm_{t2)-Cm_n-W,_(h)  (o) 


where 


M.a  -  1 


AT 


m  —  u 


is  the  autocorrelation  of  the  errors.  Having 
computed  the  autocorrelation,  (5)  may  be  mini- 
mized with  respect  to  the  W's  by  familiar 
methods,  under  the  constraint 


mm  1 


1 


"  -  * 

The  values  of  W  thus  obtained  are  the  speci- 
fication of  the  best  weighting  function."  Equa- 
tion (5)  may  then  be  used  to  determine  the 
sensitivity  of  smoothing  to  departures  of  the 
weighting  function  from  the  best  form. 

Proceeding  along  this  line,  Stibitz  found  that 
the  best  weighting  function  for  typical  actual 
tracking  errors  was  generally  intermediate  to 
the  uniform  and  parabolic  ones  shown  in  Fig- 
ures 4  and  6.  Furthermore,  Stibitz  found 
that  the  difference  in  smoothing  obtained  from 
the  best  weighting  function  on  the  one  hand 
and  from  the  uniform  or  the  parabolic  weight- 
ing function  on  the  other  hand,  is  negligible  in 
practice. 

The  autocorrelation  method  was  later  for- 
malized by  R.  S.  Phillips  and  P.  R.  Weiss  who 
incorporated  it  into  a  theory  of  prediction.7  A 
brief  exposition  of  this  formulation  is  given 
in  Appendix  B. 

ELEMENTARY  PULSE  METHOD 

For  the  purposes  of  this  method,  an  ele- 
mentary noise  pulse  is  defined  by  a  time  func- 
tion F0(t)  which  satisfies  the  following  require- 
ments: 

1.  Identically  zero  when  t  <  0. 


2.  Contains  no  terms  which  increase  expo- 
nentially with  time. 

3.  Power  specLium  N(„>2)  is  the  same  as  that 
of  the  noise. 

The  noise  is  then  regarded  as  the  result  of 
elementary  noise  pulses  started  at  random. 
Alternatively,  it  may  be  regarded  as  the  result 
of  flat  random  noise  passed  through  a  network 
whose  transmission  function  is  S(p)  =  L 
[F„(t)].  As  a  matter  of  fact,  only  S(p)  is 
required  in  the  analysis,  and  this  is  readily  de- 
termined from  the  relation 

|S(uo)l2  =  AF(«*)  , 

together  with  the  condition  that  S(u>)  cor- 
responds to  the  transmission  function  of  a 
minimum-phase  physical  structure  (cf.  Appen- 
dix B). 

The  response  F(t)  to  the  elementary  noise 
pulse  Fu(t)  of  a  network  whose  impulsive  ad- 
mittance is  W(t)  is  given  by  the  operational 
equation 

F(()  =  S(p)  ■  W(t) 

in  accordance  with  the  footnote  in  Section  A.5, 
Appendix  A.  The  best  form  for  W(t)  is  there- 
fore that  which  minimizes  the  integral 


/.: 


[F(0iJ  dt 


under  the  restriction 


when  t0  >  T 


W(t)  dt 


(G) 


(7) 


b  The  computations  involved  may  be  considerably  re- 
duced by  noting  the  symmetry  property  proved  in  Sec- 
tion B.2,  Appendix  B. 


This  is  as  much  of  the  elementary  pulse 
method  as  we  shall  need  in  order  to  reconsider 
the  cases  treated  in  Section  10.2.  For  the  treat- 
ment of  more  general  cases  the  method  is  de- 
scribed in  greater  detail  in  Appendix  B. 

The  minimization  of  the  integral  (6)  under 
the  restriction  (7)  reduces  to  a  simple  isoperi- 
metric  problem  in  the  calculus  of  variations,  in 
cases  in  which  S(p)  is  a  polynomial  in  p.  It  is 
essential  first  of  all,  however,  to  note  that  if 
S(p)  is  of  degree  n,  the  integral  (6)  will  con- 
verge only  if  W(t)  is  differentiate  at  least  n 
times.  In  other  words,  W  (t)  must  have  con- 
tinuous derivatives  of  all  orders  up  to  the 
(n-l)th  inclusive,  although  the  nth  derivative 
may  have  finite  discontinuities.  In  particular, 
if  W(t)  is  to  be  zero  outside  of  0  <  t  <  T.  its 
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derivatives  of  orders  up  to  the  (n-l)th  inclu- 
sive must  vanish  at  both  t  =  0  and  t  u  T.  These 
2n  boundary  conditions  must  be  imposed  on  the 
solution  of  the  Euler  equation  which  in  this 
case  is 

Wit)  =  A  . 


'(*M-i) 


a  is  a  constant  parameter  which  is  finally  ad- 
justed to  that  the  restriction  (7)  is  satisfied. 

The  first  case  treated  in  Section  10.2  is  one 
in  which  N(„r)  =  1,  whence  Sip)  =  landF(f) 
-  W{t).  The  integral  (ti)  is  a  minimum  under 
the  restriction  (7)  if  Wit)  is  constant  by 
intervals.  The  restriction  (7)  then  requires 
W(t)  to  be  of  the  form  (2). 

The  case  of  first  derivative  smoothing  treated 
in  10.2  is  one  in  which  X  \  *»)  =  «,,2,  whence  S  ip) 
=  p  and  Fit)  =-  Wit).  If  the  integral  (6)  is  to 
converge  at  all,  11/ (t)  must  not  have  discon- 
tinuities of  impulsive  or  higher  type;  in  other 
words,  Wit)  must  be  continuous  through  all 
values  of  t.  The  integral  is  a  minimum  under 
the  restriction  (7)  if  W(t)  is  constant  by 
intervals.  The  restriction  (7)  then  requires 
W(t)  to  be  of  the  form  (4). 

These  results  may  be  generalized  immedi- 
ately. In  whatever  way  the  signal  to  be 
smoothed  may  have  been  derived  from  the 
tracking  data,  let  the  power  spectrum  of  the 
noise  associated  with  it  be  N(m2)  =  a,2".  Then 
Sip)  =p"andF(f)  =  W^  (t).  If  the  integral 


(6)  is  to  converge  at  all,  w'n-n  (t)  must  be  con- 
tinuous through  all  values  of  t.  The  integral  is 
a  minimum  under  the  restriction  (7)  if 
WVin)  it)  is  constant  by  intervals.  The  restric- 
tion (7)  then  requires  W(t)  to  be  of  the  form 


W(t) 


(2n  +  1)  ! 

( 


+  1)\  ft  /       t  \1  ■ 

ssr [tO-jOJ  o<i<T.(8) 

It  may  be  noted  that  the  convergence  re- 
quirements which  arise  in  the  foregoing  dis- 
cussion are  directly  related  to  the  discussion 
and  theorem  in  Section  A.8,  Appendix  A,  with 
respect  to  the  relationship  between  discontinui- 
ties in  the  impulsive  admittance  and  its  deriva- 
tives on  the  one  hand,  and  the  ultimate  cutoff 
characteristic  of  the  transmission  function  on 
the  other  hand.  The  continuity  of  WlM)  (t)  is 
obviously  required  to  make  the  transmission 
fall  off  ultimately  at  the  rate  of  6(n+l)  db  per 
octave  against  the  rise  of  6n  db  per  octave  in 
the  noise  power  spectrum. 

The  integral  (6)  may  also  be  used  to  evalu- 
ate the  relative  advantage  of  the  best  weighting 
function  over  another  weighting  function.  As 
an  example,  consider  the  case  where  the  weight- 
ing function  (2)  is  the  best.  The  value  of  the 
integral  (6)  in  this  case  is  1/T.  If  the  weight- 
ing function  (4)  is  used  against  the  same  noise, 
the  value  of  the  integral  (6)  is  6/5 T.  Hence, 
as  far  as  rms  error  or  standard  deviation  is 
concerned,  the  second  weighting  function  is 
V5/6  or  0.913  as  efficient  as  the  first. 
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THE  THEORY  of  "smoothing  a  constant"  de- 
veloped in  the  preceding  chapter  will  be 
extended  in  this  chapter  to  the  problem  of 
smoothing  a  polynomial  function  of  time  of  any 
prescribed  degree.  The  extension  is,  however, 
restricted  to  the  case  of  a  flat  noise  spectrum. 
In  addition  to  the  smoothing  problem,  the 
analysis  also  provides  a  way  of  designing  a 
network  which  will  extrapolate  the  polynomial 
a  given  distance  t,  into  the  future.  The  network 
is  so  arranged  that  t,  is  continuously  variable. 
In  addition,  the  degree  of  the  polynomial  can 
readily  be  changed  to  fit  changes  in  the  com- 
plexity of  the  assumed  form  of  the  data,  apart 
from  noise. 

It  is  clear  that  these  results  amount,  in  a 
certain  sense,  to  an  alternative  to  Wiener's 
method  for  the  design  of  prediction  circuits  for 
general  time  series.  Thus,  to  predict  a  time 
series  of  any  given  complexity  we  would  need 
only  to  begin  with  a  polynomial  of  sufficiently 
high  degree  to  fit  the  observed  data,  and  extra- 
polate. Aside  from  the  restriction  to  a  flat 
noise  spectrum,  perhaps  the  most  obvious  dif- 
ference from  Wiener's  method  is  the  fact  that 
the  settling  time  restriction  limits  the  data 
upon  which  the  prediction  rests  to  a  finite  in- 
terval in  the  past.  To  advance  such  a  prediction 
theory  seriously,  however,  it  would  be  neces- 
sary to  go  much  farther  into  the  way  in  which 
the  degree  of  the  polynomial  is  established  and 
the  justification  for  assuming  that  the  extra- 
polated value  represents  a  probable  future 
value  for  the  function.' 

This  general  discussion  will  not  be  under- 
taken here.  Since  prediction  with  high  degree 
polynomials  will  certainly  be  sensitive  to  minor 
irregularities  in  the  data,  tracking  errors 
would  necessarily  limit  the  application  of  the 
method  in  any  case.  If  we  confine  ourselves  to 
reasonably  low  degree  polynomials,  however, 


»  As  an  example  of  possible  difficulties  we  may  notice 
the  fact  that  two  polynomials  of  different  degree  which 
approximate  a  given  function  as  closely  as  possible,  in 
a  least  squares  sense,  in  a  prescribed  interval  fre- 
quently differ  radically  outside  that  interval. 


the  method  is  useful.  An  example  is  furnished 
by  the  prediction  of  airplane  position,  in  rec- 
tangular coordinates,  by  quadratic  functions  of 
time.  Here  the  square  terms  represent  the 
effects  of  accelerations  in  the  various  coordi- 
nates. We  can  defend  the  inclusion  of  such 
terms  on  the  ground  that  it  is  plausible  to  as- 
sume that  an  airplane  may  experience  constant 
accelerations,  due  to  turns,  the  force  of  gravity, 
etc.,  for  considerable  periods  of  time.  The 
linear  term  represents  plane  velocity  and  needs 
no  defense.  The  constant  term,  of  course,  gives 
the  plane  position  at  some  reference  time.  In- 
cluding it  in  the  smoothing  operation  is  equiva- 
lent to  introducing  "present-position"  smooth- 
ing of  the  sort  suggested  by  the  broken  lines 
in  Figure  1  of  Chapter  7.h 

Aside  from  its  direct  interest  as  a  possible 
prediction  method,  the  analysis  in  this  chapter 
is  also  of  indirect  interest  for  the  additional 
light  it  sheds  on  the  effect  of  the  noise  spec- 
trum on  smoothing  functions.  It  turns  out  that 
smoothing  a  power  of  time,  with  a  flat  noise 
spectrum,  is  equivalent  to  smoothing  a  constant 
with  a  somewhat  different  noise  spectrum. 
Thus  the  smoothing  functions  developed  for 
polynomials  are  also  useful  as  special  cases  of 
smoothing  functions  applicable  to  constants. 


n.i 


Let  A  be  any  past  value  of  time  and  let  t  be 
the  present  value.  If  the  data  is  fitted  with  a 
smooth  curve  E  (k) ,  the  predicted  value  may  be 
taken  as  E(t  +  tf).  The  procedure  of  fitting  is 
the  familiar  one  of  minimizing  the  integral 


[  E(\)  -  E(\)  ]J  W,(t,\)  rfX 


b  In  the  circuit  of  Figure  1,  Chapter  7,  however,  the 
smoothing  network  would  produce  a  lag  in  the  present- 
position  data  delivered  to  the  prediction  circuit,  and 
this  lag  would,  of  course,  mean  some  error  in  follow- 
ing a  moving  target.  In  the  method  described  in  this 
chapter  such  lags  are  automatically  compensated  for 
by  adjustments  in  the  coefficients  of  the  other  terms  of 
the  polynomial. 
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with  respect  to  disposable  parameters  in  E(k) 
and  a  prescribed  weighting  function  Wn(t,k). 
The  lower  limit  of  the  integral  is  indicated  as 
—  oo  in  compliance  with  the  physical  impossi- 
bility of  discriminating  between  relevant  and 
irrelevant  data,  with  fixed  linear  networks,  ex- 
cept on  the  basis  of  age.  The  burden  of  dis- 
crimination must  be  relegated  to  the  weighting 
function  which  must  be  a  function  only  of  the 
age  t  -  A.  Under  the  ideal  restriction  that 
Wn(t  —  A)  is  identically  zero  when  t  -  A  >  T  or 
A  <  t  —  T,  the  indicated  lower  limit  of  the  in- 
tegral is  purely  nominal. 

As  in  Section  10.2,  it  is  convenient  to  con- 
duct the  analysis  in  terms  of  the  age  variable 
t  =  t  —  A  introduced  there.  If 


In  terms  of  the  forward  time  A,  (2)  and  (3) 
reduce  to 


F(r)  =  F(r)  =  K{\) 


the  integral  to  be  mir 
in  the  form 


I  may  be  expressed 


|>»  -  F(t)\2  ir„(r)  i/t  . 


tl 


In  accordance  with  the  discussion  of  quasi- 
distortionless  transmission  networks  in  Section 
A. 10,  Appendix  A,  the  smooth  curve  K (a) 
should  be  a  polynomial  in  A.  Hence  F(t) 
should  be  a  polynomial  in  r.  It  will  be  more 
convenient,  however,  to  express  F(t)  formally 
as  a  linear  combination  of  polynomials  in  t 
which  may  be  orthogonalized.  Hence,  let 

F{r)  =  \\+\'i-Gt(T)+\\-(,\(T)+  -  +IV^'„<T) 

(2) 

where  G,„(t)  is  an  mth  degree  polynomial  in  t. 
Let  Wu(t)  be  normalized  in  the  sense  that 


f  W0(r)  dr  =  1 
Jo 


and  the  Gm(r)  be  orthogonalized  with  respect 

to  the  weighting  function  W„(t)  in  the  sense 
that 

/    G,(t)  Gm(r)  W0(t)  dr  =    0  if  /  *  m 

Jo                                                   »  f, 

=  j  -     if  /  =  m 

(G0  =  1,  Ao  =  1). 

The  integral  (1)  is  then  a  minimum  with 
respect  to  the  Vm's  in  (2)  if 

Vm  =  km  jf 00  F(T)  ■  GJt)  ■  H'„(t)  <tr  .  (3) 


E(\)  =  Yn(t)  +  Wit)  ■  Gx(t  -  A)  +  V,(t)  ■  Gt(t  -  A) 

+  -  +  Vn(t)  -Gn(t-\)  (4) 

where 

!'„,(/)  =  km  f    E(\)  -Gm(t-\).  W0(t-\)dk.(5) 


Expression  (5)  identifies  the  Vm(t)  as  the 
responses  to  E(k)  of  fixed  linear  networks 
whose  impulsive  admittances  are 

ir,„(r)  =  k„,Gm(r)  :  W0(r)  .  (6) 

By  (4),  the  predicted  value  may  be  obtained 
by  a  linear  combination  of  the  responses  of 

these  networks,  viz., 

Mi  +  U)  =  Y»(t)  +  Gii-t,)  ■  \\(f)  +  G,(-if)  -Vtit) 
+  ■■■  +  Gn(-if)  ■  Vn(t)  .  (7) 

A  schematic  representation  of  an  nth  order 
smoothing  and  prediction  circuit,  based  on  (7), 
is  shown  in  Figure  1,  where  the  G„,  (  —  t,)  are 
represented  as  potentiometer  factors  dependent 
on  the  time  of  flight. 

E(nt,) 


E(t>- 


I  1  i— Wv- 

-  Y,(P)  -AMAv-i 
U  1  G.C-t,) 


Y.(P> 


AAAr-r 


t> 


Gn(-V  4- 


Figure  1.    Schematic  representation  of  nth  order 
smoothing  and  prediction  circuit. 

Alternatively,  (7)  may  be  written 

K(t  +  t/)  =  E(t)  +       -  //)  -  G,(0)]  •  V,(0  +  ••• 
+  [Gn(  -  tf)  -  G„(0)]  •  Vn(t)  (8) 

where  E(t)  is  then  replaced  by  Eit)  when 
position  data  smoothing  is  to  be  omitted. 

It  is  not  necessary  that  the  G,(r)  polyno- 
mials be  orthogonal.  However,  the  circuit 
switching  required  to  reduce  or  increase  the 
order  of  the  prediction  is  simplest  when  the 
G„,(t)  polynomials  are  orthogonal.  Orthogonal 
polynomials  corresponding  to  any 
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weighting  function  W0(T)  are  readily  derived 
by  well-known  methods,. 

The  weighting  function  W0(r)  may  be  deter- 
mined by  either  of  the  methods  described  in 
Appendix  B  as  the  best  weighting  function  for 
smoothing  position  data,  under  prescribed 
tracking  error  characteristics.  Then  the  best 
impulsive  admittances  Wm(T)  for  a  smoothing 
and  prediction  circuit,  are  prescribed  by  (6). 

The  relationship  (6)  shows  that  if  the  pre- 
scribed weighting  function  W0(T)  satisfies  the 
formal  requirements  for  physical  realizability, 
so  will  all  of  the  impulsive  admittances  Wm(r). 
Of  the  standard  sets  of  orthogonal  polynomials 
those  of  Laguerre  appear  to  be  the  best  adapted 
to  physical  realization.  The  Laguerre  polyno- 
mials L„(a>  (T)  are  orthogonal  in  0  <  t  <  oo 
with  the  weighting  function  rae~\  However, 
such  a  weighting  function  is,  in  general,  very 
unsatisfactory  from  the  practical  point  of  view 
of  settling  characteristics. 

It  is  possible  of  course  to  approximate  any 
prescribed  weighting  function  W0  (t)  as  closely 
as  may  be  desired  in  a  physically  realizable 
form,  derive  a  set  of  orthogonal  polynomials 
based  on  the  approximate  form,  and  determine 
the  impulsive  admittances  Wm(T)  from  (6). 
However,  such  a  procedure  leads  to  complexities 
of  network  configuration  which  increase  very 
rapidly  withrthe  index  to.  This  increasing  com- 
plexity is  hardly  justifiable  in  practice. 

From  the  foregoing  considerations,  it  ap- 
pears that  the  most  practical  procedure  is  to 
derive  all  of  the  impulsive  admittances  Wm(T) 
without  regard  to  physical  realizability,  ap- 
proximate them  independently  in  physically 
realizable  forms  of  independently  prescribed 
complexities,  and  modify  or  redetermine  the 
potentiometer  factors  in  accordance  with  the 
discussion  in  Section  A.10,  Appendix  A. 

11  a       WEIGHTING  FUNCTIONS  FOR 
DERIVATIVES 

The  impulsive  admittances  defined  by  (6) 
for  m  >  0  may  not  be  regarded  as  weighting 
functions  even  though  the  response  of  the  cor- 
responding networks  to  E  (a)  is,  by  (5) 

Vm  (0  -  f  K(t  -r)  •  Wm  (t)  'fir, 
Jo 


because,  with  the  exception  of  We(r),  the 
Wm(T),  as  will  presently  be  seen,  cannot  be  nor- 
malized. The  term  weighting  function  is  re- 
served for  the  functions  defined  by  (11)  below. 

Since  rr  is  a  linear  combination  of  the  G,  (t) 
where  s  =  0,  1,  •  •  •  ,  r,  it  is  obvious  from  (6) 
that 

oo 

/     ?WUl)  dr  =  0 

when  r  <  m  . 
In  particular 

/     WJr)  dr  =  0 

when  m  >  0  . 

Since  the  transmission  function  Ym(p)  of  a 
network  is  the  Laplace  transform  of  its  im- 
pulsive admittance  (see  Section  A.3) ,  we  have 

/CO 
Wm(r)  e~'*  dr 

y  ( -  p)r  r 

■ 

The  first  m  terms  in  this  series  vanish.  Hence 
Ym  (p)  will  be  of  the  form 

Tm(p)  =  r"y-(p)  (10) 

where  ym  (0)  ^=0.  This  permits  us  to  regard  the 
network  whose  impulsive  admittance  is  Wm(T) 
as  an  instantaneous  mth  order  differentiator, 
corresponding  to  the  factor  p*  in  (10),  in 
tandem  with  a  purely  smoothing  network 
whose  transmission  function  is  ym(p). 

It  is  convenient  to  associate  a  weighting 
function  wm  (T)  with  the  purely  smoothing  net- 
work whose  transmission  function  is  ym(p) . 
Dividing  (10)  through  by  pm  the  resulting 
operational  equation  may  be  interpreted  (see 
Section  A.5)  to  mean  that  the  weighting  func- 
tion wm(T)  is  the  m-fold  integral  of  the  im- 
pulsive admittance  Wm(T)  between  the  limits 
0  and  t.  This  is  expressed  by 

o  Jo    WmiT)  '{dT)m-  (11> 

By  a  relationship  similar  to  (9)  between  ym(p) 
and  wHl  (r) ,  it  follows  from  ym  (0)  ^  0  that 


u>„(r)  dr  *  0  . 
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Hence  the  wm(T)  may  be  normalized  in  the  it  is  readily  determined  that 
sense  that 


jT   wm  (t)  dr  =  1 


jp-    /    [G«(t)]»  W.(t)  dr 
"      ^/ o 


(ml)' 
(2m)!  (2m  +  1)!  ' 


for  all  values  of  to.  However,  this  may  he  done 
in  general  only  if  the  G„(t)  polynomials,  are    Then,  by  (6) 
not  normalized  in  the  sense  that  km  =  1  i&c  any 

value  of  to  >  0.  It  is  in  fact  readily  shown  that    Wm(r)  =  (-)m  .(2rw  +  U !  pm  (2T  -  1)   0  £  r  :£  1 


the  coefficient  of  i*  in  G,„(t)  must  be  the  same 
as  that  of  rm  in  cT. 
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LEGENDRE  POLYNOMIALS 


m! 

=  0     r  >  1  . 

Substituting  this  in  turn  into  (11)  and  making 
use  of  Rodrigues'  formula 


The  Legendre  polynomials  P„t  (x)  are  orthog- 
onal with  respect  to  the  range--  1  <  x  <  1  and 
uniform  weighting.  In  other  words,  the  poly-  or 
nomials  P„(2t  —  1)  are  orthogonal  with  respect 
to  the  range  0  <  t  <  co  and  the  weighting  func- 
tion6 


( —  \m  dm 

p-<*>  "  SOT  (1  "  *>" 


p-(2t  -  1}  -  S^r  £ M1  -  w 


W0(r)  =  1      when  0  <.  r  <,  1 
=  0      when  t  >  1  . 


It  is  known  from  Section  10.4  that  this  form 
for  the  weighting  function  W0(t)  is  best  in 
case  the  tracking  errors  are  flat  random  noise. 
In  the  integral  (1)  to  be  minimized,  the  Gm(r) 
polynomials  should  then  be 

The  first  few  of  these  are  tabulated  below. 


it  is  finally  found  that 
(2m  -I-  1)! 


=  0     T  >  1. 


[t(1  -  t)]«       0  £  T  £  1 


(12) 


By  a  relationship  of  the  form  of  (9)  the 
transmission  functions  ym(p)  corresponding  to 
the  weighting  functions  wm(T)  may  be  deter- 
mined. The  first  three  are 

1  -  e-* 


Vo(p) 


m 
0 


Gm(r) 


2~r 

2  i_I  +  I1 
12     2  2 

3  —  -  +  -  -  - 

120      10^  4  6 


6 


Vt(P)  -  Jt  l(P  -  2)  +  (p  +  2)9-'] 
V*(P)  -  p  1(P»  "  6p  +  12)  -  (pi  +  6p  +  m-'\. 
These  may  be  written  in  the  form 

Vm(p)   -  QmM   •  rM 

where 


(13) 


With  the  help  of  the  formula 


j  [Pm(z))*d* 


2m  +  1 


0  The  unit  of  time  being  equal  to  the  nominal  smooth- 
ing time. 


&(«) 

QM) 
0.(«) 
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sin  x      /  J\ 

-—  V  -  V 


X  cos  z 


16  0  ~  xt)  SEj *  ~  31  006  *  (14) 
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or  in  the  infinite  power-series  form 


„r,  (»  +  «i 


Vt(p)  =  60  £ 


■  -0 


(n  +  l)(n  +  2) 
(n  +  5)! 


(-P)V  (15) 


Methods  for  obtaining  physically  realizable  ap- 
proximations to  the  weighting  functions  wm(r) 
or  impulsive  admittances  Wm(T),  based  upon 
the  Q  functions  (14)  and  the  series  expansions 
(15)  are  described  in  Chapter  12. 
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PHYSICAL  REALIZATION  OF  DATA-SMOOTHING  FUNCTIONS 


This  chapter  will  be  devoted  to  a  brief  re- 
view of  some  of  the  methods  and  techniques 
which  have  been  used  in  the  physical  realiza- 
tion of  data-smoothing  or  weighting  functions. 
The  first  two  sections  will  be  devoted  to  meth- 
ods for  determining  physically  realizable  ap- 
proximations to  a  desired  weighting  function. 
The  third  section  takes  up  the  use  of  feedback 
amplifiers  and  servomechanisms  in  order  to 
avoid  the  use  of  coils  of  generally  fantastic 
sizes.  The  final  section  takes  up  the  design  of 
resistance-  capacitance  networks. 

Methods  of  deriving  physically  realizable  ap- 
proximations of  best  weighting  functions  may 
be  divided  into  two  classes,  which  may  be 
called,  for  convenience,  /-methods  and  p-meth- 
ods.  The  i-methods  are  those  in  which  a  pre- 
scribed best  weighting  function  W(t)  is 
approximated  directly  by  a  function  W„(t)  of 
realizable  form,  viz.,  a  sum  of  decaying  expo- 
nential terms  and  exponentially  decaying  sinu- 
soidal terms.  However,  the  <-methods  are  most 
useful  when  the  approximation  is  restricted  to 
a  sum  only  of  exponential  terms.  According  to 
the  discussion  in  Section  A.9,  Appendix  A,  such 
a  restriction  corresponds  physically  to  passive 
RC  transmission  networks.  A  <-method  was 
used  by  Phillips  and  Weiss  in  the  reference 
quoted  in  Section  10.3  to  obtain  an  approxi- 
mation with  one  decaying  exponential  term  and 
one  exponentially  decaying  sinusoidal  term. 
However,  this  method  rapidly  becomes  un- 
wieldy as  the  number  of  terms  is  increased. 

The  p-methods  are  those  in  which  the  ap- 
proximation is  derived  indirectly  from  the 
transmission  function  Y(p)  corresponding  to 
W(t).  A  rational  function  Ya(p)  approximat- 
ing Y(p)  is  first  determined.  If  it  is  realizable, 
and  it  usually  is,  then  Wa(t)  =  L^lYaip)].  In 
general,  Ytt(p)  will  have  complex  poles  and, 
therefore,  Wa(t)  will  have  exponentially  decay- 
ing sinusoids  as  well  as  simple  exponentials. 
This  gives  the  p-methods  a  considerable  advan- 
tage over  the  f-methods  in  more  efficient  use  of 
network  elements.  The  fact  that  this  generally 
calls  for  impractical  element  values  in  passive 


RLC  networks  is  not  serious.  As  shown  in  Sec- 
tion 12.3,  the  use  of  coils  may  be  avoided 
entirely  by  the  use  of  feedback  amplifiers. 

121  ^-METHODS 
To  describe  the  ^-method,"  let 

Wa(t)  =  Aie-i\  +  A*—*  +  ■  ■  ■  +  Aen-.t  (1) 

where  the  a's  are  prescribed  and  the  A's  are  to 
be  determined.  Two  considerations  are  involved 
in  the  determination  of  the  A's.  The  first  con- 
sideration is  based  on  the  relationship  between 
the  continuity  conditions  at  t  =  0  and  the  ulti- 
mate slope  of  the  loss  characteristic  as  ex- 
pressed in  the  theorem  in  Section  A.8.  Accord- 
ingly, a  number  of  relations  of  the  type 

Ai  +  A-i  +  ■  ■ .  -f-  An  =  0 
a\  Ax  +  a,  At  +  ...  +  a„  A„  =0  (2) 

«'  A ,  +  al  A2  +  .  .  .  +  a„r  An  =  0    r  <  n  -  1 

must  be  satisfied.  This  leaves  n  -  r  -  1  of  the 
A's  for  the  second  consideration. 

The  second  consideration  concerns  the  man- 
ner in  which  the  approximation  in  the  range 
t  >  0  is  to  be  made.  The  approximation  may, 
for  example,  be  required  to  pass  through 
n  -  r  -  1  points  on  W(t)  or,  the  first  n  -  r  -  1 
moments  of  the  approximation  may  be  required 
to  be  equal  to  the  corresponding  moments  of 
W(t).  The  latter  is  expressed  by  relations  of 
the  type 

Ai     A2  An  1  /*c° 

-+-+■■■+-  =  —77,  /    W(t)  /—  dt 

s  -  1,  2,  •  •  • ,  n  -  r  -  1  (3) 

Foster's  investigations  were  concerned  only 
with  the  parabolic  weighting  function  (4) 
Chapter  10,  so  that  only  the  first  of  (2)  was 
involved.  Numerical  studies  led  to  the  belief 
that,  with  a  given  number  of  a's,  the  best  ap- 
proximation was  to  be  had  from  the  case  in 

■  The  i-method  is  principally  due  to  R.  M.  Foster. 
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which  all  of  the  a's  are  equal.  Hence  the  natural 
center  of  attention  was  the  special  form 

Wa(t)  =  (Ait  +  Ad*  +  •  ■  •  +  An-if -»)«-*.  (4) 

At  large  values  of  t  this  expression  reduces  ap- 
proximately to  the  last  term,  and  if  it  is  as- 
sumed that  An.i  =  1,  the  settling  condition  fixes 
a  to  at  least  a  first  approximation.  The  rest  of 
the  work  of  approximating  the  parabola  is  then 
equivalent  to  a  problem  in  polynomial  approxi- 
mation. Once  the  A's  are  determined,  a  better 
value  of  a  can  be  found  from  the  settling  con- 
dition, and  the  process  gone  through  again. 

If  the  a's  are  only  approximately  equal,  the 
approximation  will  still  behave  approximately 
like  (4)  with  an  average  value  used  for  a.  The 
difficulty  with  equal  or  nearly  equal  a's  is  that 
it  leads  to  networks  with  extreme  element 
values.  In  order  to  secure  satisfactory  element 
values,  it  is  generally  necessary  to  depart  sub- 
stantially from  the  condition  of  equal  a's.  This 
results  in  some,  but  not  a  large,  loss  of  effi- 
ciency in  approximating  the  parabola.  Foster 
recommends  that  the  a's  be  chosen  as  a  geo- 
metric series,  with  their  geometric  mean  more 
or  less  around  the  equivalent  point  for  equal 
a's.  With  four  a's  he  suggests  that  the  constant 
ratio  in  the  series  may  be  3:2,  whereas  with 
only  two  a's  the  ratio  should  be  raised  to  2:1. 
These  are,  however,  only  rough  values  and 
obviously  depend  on  individual  opinion  of  what 
constitutes  an  unreasonable  element  value. 

As  a  matter  of  experience,  it  turns  out  that 
the  characteristic  first  obtained  usually  has  a 
rather  long  and  slowly  decaying  tail,  as  shown 
in  Figure  1.  This,  of  course,  is  equivalent  to  a 


Figure  1.   Approximation  to  parabolic  weighting 
function,  showing  poor  settling  characteristic. 

correspondingly  long  "settling  time,"  or  time 
before  a  useful  prediction  can  be  made.  In 
practice,  therefore,  after  the  preliminary 
design  has  been  found,  adjustments  are  made 
to  bring  the  tail  of  the  curve  under  control, 


partly  by  modifying  the  values  of  the  A's 
slightly,  and  partly  by  contracting  the  time 
scale  to  bring  the  part  of  the  tail  which  remains 
appreciable  within  the  allowable  settling  time 
limits.  This  leads  to  the  somewhat  lopsided 
match  to  the  parabola  shown  in  Figure  2. 


Figure  2.  Approximation  to  parabolic  weighting 
function,  showing  better  settling  characteristic. 

A  method  of  bringing  the  tail  of  the  curve 
under  control"  is  to  minimize  the  expression 


where 


/{Wa(t)]2d!  =  2£  C,„A,A, 


(5) 


-<.,+«m)r 


ai  +  am 

under  the  restrictions  (2)  and  all  but  the  last 

of  (3). 

The  f-methocj  used  by  Phillips  and  Weiss  is 
based  on  a  3-term  approximation  of  the  form 
(1)  in  which  one  a  is  real  while  the  other  two 
may  be  conjugate  complex.  The  a's  are  not 
prescribed,  so  that  there  are  six  parameters  to 
be  determined.  Four  restrictions  are  imposed, 
viz.,  the  first  of  (2),  the  first  of  (3),  a  restric- 
tion on  the  value  of  the  tail  area,  viz., 

-.r 

W.(t)dt  =  ZAL£_L, 
't  '-1  at 

and  the  cross-over  condition 

Wa(T)  =  0. 

Finally,  the  transmitted  noise  power,  which, 
under  the  assumption  of  flat  random  noise  as- 
sociated with  the  position  data,  takes  the  form 
(see  Section  10.4) 


r 


[W.(t))t  di 


is  minimized  with  respect  to  the  two  remaining 
parameters  by  numerical  methods. 





"  Used  by  R.  F.  Wick. 


CONFIDENTIAL 


—  — 


/>•  METHODS 

-*-  





119 


12.2 


p-METHODS 

Three  p-methods  have  been  used.  These  will 
be  described  in  chronological  order. 

The  first  p-method  is  one  which  was  used  by 
R.  L.  Dietzold  in  exploiting  the  use  of  feedback 
amplifiers  to  secure  the  advantages  of  approxi- 
mations with  complex  exponentials.  The  trans- 
mission function  Y(p)  corresponding  to  the 
best  weighting  function  W(t)  is  first  formu- 
lated. The  loss  characteristic,  -20  log,„  \  Y(im)  |, 
is  next  computed  and  plotted  against  the  fre- 
quency on  a  logarithmic  scale.  Then  standard 
equalizer  design  techniques  are  employed  to  ap- 
proximate the  loss  characteristic,  keeping  in 
mind  that  the  transmission  loss  in  the  feedback 
network  of  a  feedback  amplifier  becomes  a 
transmission  gain  for  the  circuit  as  a  whole 


(14)  of  Chapter  11,  we  get 

J/o  (p)  = 

Vi(p)  = 


2  +  p 
12 


y*(p) 


12  +  6p  +  p» 
120 


(6) 


The  second  p-method  is  merely  a  more  com- 
plete analytic  formulation  of  the  first,  thereby 
avoiding  the  necessity  for  employing  equalizer 
design  techniques.  It  depends  upon  the  possi- 
bility of  expressing  the  transmission  function 
corresponding  to  the  best  weighting  function, 
in  the  form  of  equation  (13)  Chapter  11,  which 
is  associated  with  the  symmetry  of  the  weight- 
ing function,  as  shown  in  Section  A.7.  The 
method  is  based  upon  the  determination  of  the 
envelope  of  the  Q-function.  The  Q-function  is 
first  differentiated  in  order  to  obtain  the 
equation  which  determines  the  values  of  « 
at  which  the  maxima  and  minima  occur.  This 
transcendental  equation  is  not  solved  but  is 
used  to  eliminate  the  trigonometric  functions 
in  the  expression  of  the  Q-function.  The  result- 
ing expression,  which  is  an  irrational  function 
of  «o2,  is  then  squared  in  order  to  make  it  a 
rational    function    of   »>.    The  substitution 
p*  =  -  o.2  is  made  and  the  expression  is  then  re- 
solved into  two  factors  of  which  one  contains 
all  the  poles  with  negative  real  parts  while  the 
other  contains  all  the  poles  with  positive  real 
parts,  the  two  factors  being  conjugate  complex 
when  p  =  to>.  The  first  factor  is  then  taken  as  an 
approximation   of  the  desired  transmission 
function.  Applying  the  method  to  the  desired 
transmission  functions  defined  by  (13)  and 


120  +  60p  +  12p*  +  p»  • 
This  last  is  the  basis  for  the  design  of  a  posi- 
tion and  rate  smoothing  circuit  for  a  proposed 
computor  for  controlling  bombers  from  the 
ground."11  This  design  is  described  briefly 
in  Chapter  13. 

The  third  p-method  is  based  upon  the  ascend- 
ing power-series  expansion  of  the  transmission 
function  corresponding  to  the  best  weighting 
function.  Examples  of  such  power  series  are 
given  by  (15)  of  Chapter  11.  The  method  of 
approximation  is  one  which  is  credited  to  Pade 
in  0.  Perron's  "Kettenbruchen.""  If  the  discus- 
sion in  Section  A.8  is  referred  to,  it  will  be  seen 
to  be  also  a  method  of  moments. 

The  method  consists  in  determining  the  co- 
efficients in  a  rational  function  of  the  form 

1  +  QiP  +  Qip»  +  j-  ampm 

1  +  blP  +  6,p»  +  .  .  .  +  6„p»  w 
so  that  the  ascending  power-series  expansion 
of  the  rational  function  will  agree  with  that  of 
the  best  transmission  function,  term  for  term 
up  to  and  including  pm**.  If  the  series  for  the 
best  transmission  function  is 

1  +  cp  +  c,p*  +  . . .  +  c«+„p»+"  +  . . .  (8) 
the  equations  which  determine  the  coefficients  in 
(7)  are  obtained  by  equating  coefficients  of 
corresponding  powers  of  p,  up  to  and  including 
the  (m  +  n)th,  in 

(1  +  blV  + 


and 


+  fe.p")  (l  +  c,p  +  •  •  • 

+c-+.p"+") 


1  +  <HP  +  •  •  •  +  anpm. 
The  last  n  equations  will  be  homogeneous  in 
the  6's  and  c's. 

It  has  been  expedient  in  some  cases  to  omit 
the  last  few  of  the  (m+n)  equations  in  order 
to  have  some  control  over  the  number  of  real 
roots  and  poles  and  the  number  of  conjugate 
pairs  of  complex  roots  and  poles  in  the  result- 
ing rational  function. 

In  the  assumed  rational  expression  (7)  the 
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difference  n  —  m  "Should  be  chosen  so  that  the 
ultimate  slope  of  the  loss  characteristic  will  be 
the  same  as  for  the  best  transmission  function. 
According  to  the  theorem  in  Section  A.8,  if 
W(t)  behaves  like  if  as  t->0,  we  should  take 
n  —  m  =  r  +  1.  As  a  matter  of  experience  the 
rational  expression  has  invariably  turned  out 
to  be  physically  realizable  whenever  this  "rule" 
was  followed.  Frequently,  however,  the  rational 
expression  has  turned  out  to  be  physically 
realizable  under  small  departures  from  the 
rule. 

Examples  of  this  method  are  given  in  Chap- 
ter 13. 

USE  OF  FEEDBACK  AMPLIFIERS 
AND  SERVOMECHANISMS 

In  this  section  we  shall  describe  the  use  of 
feedback  amplifiers  and  servomechanisms  to 
obtain  desired  transmission  functions.  For  com- 
plete discussions  of  the  most  recent  technical 
advances  in  the  analysis  and  design  of  feedback 
amplifiers  and  servomechanisms  the  reader 
should  consult  some  of  the  modern  literature 
on  these  subjects.2  3-51sl61T 

Let  us  assume  that  we  have  two  networks 
whose  transmission  functions  are  Yt(p)  and 
Y2(p),  respectively,  as  shown  in  Figure  3.  For 


Y2(P)  ^>V(t) 


I£(t)  =  Y2(p)-V(t) 

itic  representation  of  networks 
ick  circuit  application. 

a  signal  E(t)  applied  to  the  first  network  the 
short-circuit  output  current  is  /,(£)  =  Yx(p)' 
E(t).  For  a  signal  V(t)  applied  to  the  second 
network  the  short-circuit  output  current  is 


1 

Vi2 

Figure  4.    First  step  in  combining  networks. 

hit)  =  7,  (p) -7(0- With  the  networks  sharing 
a  common  short-circuiting  conductor  as  shown 
in  Figure  4,  the  current  through  the  conductor 
is  7,  -I-  I2.  If  the  source  which  develops  the  volt- 


age V(t)  across  the  input  terminals  of  the 
second  network  were  in  fact  under  the  control 
of  the  current  through  the  conductor,  as  shown 
schematically  in  Figure  5,  in  such  a  manner 


Figure  5.    Output  voitage  controlled  by  short- 
circuit  current  across  intermediate  terminals. 

that  it  had  to  develop  that  voltage  V(t)  which 
reduces  the  current  in  the  conductor  to  zero, 
then 

Yxip)    E(t)  +  Yt(p)  ■  V(t)  =  0  . 

Hence,  the  transmission  function  (now  a  volt- 
age-voltage ratio)  of  the  arrangement  shown 
in  Figure  5  must  be 

Yi(p) 


Y(p)  =  - 


(9) 


Y,(p)  ' 

This  relationship  provides  a  method  of  ob- 
taining transmission  functions  with  complex 
poles  without  the  requirement  of  coils.0  The 
complex  roots  of  Y(p),  must  be  assigned  to  the 
numerator  of  Y1  (p) ,  and  the  complex  poles  of 
Y(p)  to  the  numerator  of  Yt(p).  Aside  from 
this,  the  other  roots  and  poles  of  Y(p)  may  be 
assigned  in  any  way  which  is  favorable  to  good 
design  practice.  Redundant  factors  may  be  in- 
troduced if  they  are  desirable,  as  is  done  in  the 
examples  described  in  Sections  13.1.5  and  13.3. 

The  source  of  the  voltage  V(t)  in  Figure  5 
does  not' have  to  be  controlled  by  the  current 
through  the  short-circuiting  conductor.  Since 
the  current  through  any  short  circuit  must  be 
zero  if  the  voltage  across  the  short-circuited 
terminals  is  zero  before  the  short  circuit  is  con- 
nected across  them,  the  source  of  the  voltage 
V(t)  may  just  as  well  be  controlled  by  the 
open-circuit  voltage,  as  shown  in  Figure  6.  It 
is  clear  that  the  source  of  the  voltage  V(t)  is 
ideally  an  infinite  gain  amplifier.  It  is  not  nec- 
essary, however,  that  the  amplifier  have  ideally 
unilateral  transmission  and  infinite  input  and 
output  impedances,  since  departures  from  these 
ideal  characteristics  may  be  compensated  for  in 
the  design  of  the  feedback  network. 

The  simple  result  expressed  by  (9)  may  be 
readily  modified  to  take  account  of  the  finite 


0  This  observation  was  first  made  by  R.  L.  Dietzold. 


CONFIDENTIAL 


DESICN  OF  RC  NETWORKS 


121 


gain  of  a  physical  amplifier.  The  modification 
will  be  expressed  as  an  extra  factor  which 
corresponds  to  the  "rf  effect"  or  "nfi  error"lie 
commonly  encountered  in  the  theory  and  design 
of  feedback  amplifiers. 


■C 


7T 


Figure  6.    Output  voltage  controlled  by  open- 
circuit  voltage  across  intermediate  terminals. 

The  exact  transmission  function  of  the  cir- 
cuit shown  in  Figure  6  is  most  simply  ex- 
pressed in  terms  of  the  following  quantities: 
=  current  through  a  short  across  ter- 
minal-pair No.  3,  per  unit  emf  applied 
across  terminal-pair  No.  t. 
Y2  (p)  =  current  through  a  short  across  ter- 
minal-pair No.  3,  per  unit  emf  applied 
across  terminal-pair  No.  2. 
Z2  (p)  =  impedance  between  terminal-pair  No. 

2,  with  terminal-pair  No.  3  shorted. 
Z3(p)  =  impedance  between  terminal-pair  No. 

3,  with  amplifier  dead,  terminal-pair 
No.  1  shorted,  and  terminal-pair  No.  2 
open. 

G(p)  =transadmittance  of  amplifier. 
Then 


i  - 


i 


(10) 


The  quantity  GYJZ„Z3  is  the  of  the  circuit. 
The  quantity  Y,Y,Z„Z3  to  which  Y  reduces 
when  G  =  0  represents  the  direct  transmission 
of  the  circuit. 

The  active  impedance  across  terminal-pair 
No.  2  is 

Zip  

(ID 


ZtA 


1  —  Gi  2Z2Z3 
where 

ziP  =  zt{\  +  r|?,z,) .  (12) 

ZtP  is  the  passive  impedance  across  terminal- 
pair  No.  2.  It  differs  from  Z„  in  that  terminal- 
pair  No.  3  is  open. 


The  exact  expression  (10)  of  the  transmis- 
sion function  is  useful  chiefly  as  a  check  on  the 
simpler  but  approximate  expression  (9).  It  is 
in  general  quite  practicable  to  make  the  trans- 
admittance  or  transconductance  G  of  the  am- 
plifier large  enough  so  that  the  n0  effect  may  be 
neglected. 

In  accordance  with  the  sense  in  which  the 
term  "servomechanism"  is  used  by  MacColl,4 
a  feedback  circuit,  such  as  that  shown  in  Fig- 
ure 6,  is  a  servomechanism  —  more  specifically, 
an  electronic  servomechanism  —  since  it  oper- 
ates on  the  ideal  principle  of  maintaining  zero 
voltage  across  the  terminal-pair  No.  3.  An 
electromechanical  counterpart  of  the  circuit 
shown  in  Figure  6  is  shown  in  Figure  7.  These 


2- PHASE  INDUCTION 
MODULATOR  MOTOR 


:  7.    Electromechanical  counterpart  of  feed-' 
back  amplifier  circuit  resulting  in  servomechaniMti. 

circuits  assume  that  the  signal  E(t)  is  a  modu- 
lated d-c  carrier. 

If  the  signal  is  a  modulated  a-c  carrier, 
"shaping"  cannot  be  done  conveniently  by  elec- 
trical networks.  The  difficulty  may  be  avoided 
by  various  special  devices.  An  example  is  de- 
scribed and  illustrated  in  Section  13.4. 


12.4 


DESIGN  OF  RC  NETWORKS 


In  this  section  we  will  describe  and  illustrate 
two  general  methods  of  designing  RC  networks. 
The  first  is  most  useful  when  the  transmission 
function  is  finite  and  not  zero  at  zero  fre- 
quency; the  second,  when  the  transmission 
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function  is  zero  at  zero  frequency.  The  case  of  a 
transmission  function  with  a  pole  at  zero  fre- 
quency will  not  be  considered,  since  it  is  cov- 
ered by  the  methods  ,  described  in  the  preceding 
section,  in  conjunction  with  the  methods  de- 
scribed below. 
Let 


Y(p) 


Op  +  QiP  +  •••  +  Q.+iP"*1 


(flo>0)  (13) 


1  +  6iP  +  •  ■  •  +  6»p" 

with  simple,  real,  negative  poles.  Dividing  by 
p,  expanding  into  partial  fractions  and  multi- 
plying through  by  p,  we  get 


On  V  +  «1        P  +  «» 

\p  +  Mi     P  +  fit 


•) 


+ 


) 


where  the  A's,  B's,  ats  and  0"s  are  positive  real 
quantities.  The  first  term  must  be  associated 
with  those  in  the  first  parentheses  if  an+l  >  0, 
with  those  in  the  second  parentheses  if  an+,  <  0. 
The  transmission  function  is  now  in  the  form 

Y(P)=YAP)-YB(P)  (14) 

where  YA(p)  and  YB(p)  are  physically  real- 
izable driving-point  admittances  of  RC  type. 
Each  term  of  the  form  pA/  (p  +  a)  is  the  admit- 
tance of  the  two-terminal,  two-element  network 

a  ..a 

s — wwv — 1| — 0 

Figure  8.    Simple  RC  network. 

shown  in  Figure  8.  Each  term  in  (14)  there- 
fore represents  a  parallel  combination  of  two- 
element  networks  of  the  type  shown  in  Figure 
8  and  a  conductance  a0  in  the  case  of  YA(p), 


PHASE 
INVERTER 


SUMMING 
AMPLIFIER 


Figure  9.    Method  of  realizing  RC  transmission 
functions,  requiring  phase  inverter. 

and  a  capacitance  |Onn|/b„  in  the  case  of  either 
YAP)  or  YB(p).  By  well-known  methods  these 
two-terminal  networks  may  be  transformed 
into  a  variety  of  other  configurations. 


The  transmission  function  (14)  may  be  real- 
ized in  the  arrangement  shown  in  Figure  9 
or  in  that  shown  in  Figure  10.  The  latter  is 
a  lattice  network  which  is  suitable  only  in  a 


LINE  BRANCH 


I  =  (YA-YB).E 


Figure  10.  Lattice  prototype  for  passive  net- 
works with  RC  transmission  characteristics. 


balanced-to-ground  circuit.  To  obtain  an  un- 
balanced passive  equivalent  of  this  network  we 
may  resort  to  steps  which  will  be  described 
later  in  this  section. 

The  second  general  method  of  designing  RC 
networks  is  most  useful  when 


Y(r>)  =  r>  a°  +  a'P  +  •  ■  +  q"P" 
KV)      P  1  +  blV  +  •••  +  6.p- 


(«o  >  0) 


(15) 


with  simple,  real,  negative  poles.  Now,  if  the 
lattice  in  Figure  10  were  driven  from  an  in- 
finite-impedance source  of  current  /„,  the  out- 
put current  would  be 


1  - 


/  = 


I* 
Ya 

 Yh' 

1  t7~ 


If,  furthermore, 

Is 

Ya 

then 


 P 

»+! 

p 


(16) 


Taking  it  for  granted  for  the  moment  that  the 
lattice  can  be  transformed  as  shown  schemat- 
ically in  Figure  11,  we  may  then  discard  the 
condenser  across  the  output  terminals  and,  by 
Thevenin's  theorem,1"  we  may  replace  the 
condenser  across  the  input  terminals  and  the 
infinite-impedance  current  source  by  a  series 
condenser  and  a  zero-impedance  voltage  source. 
The  result  is  shown  in  Figure  12.  Since 
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I,  -  pC  E  we  now  have 

7  =  (  " 

k 

which  ia  the  desired  result,  to  a  constant  factor. 

The  factor  k  should  in  general  be  taken  as 
small  as  possible  subject  to  the  requirement 
that  all  the  roots  and  poles  of  (16)  be  simple, 


Figure  11.    Step  in  transformation  of  networks 
with  zero  transmission  at  zero  frequency. 

real,  and  negative.  It  can  always  be  taken  large 
enough  to  fulfill  this  requirement.  A  suitable 
value  may  be  easily  chosen  by  inspection  of  a 
plot  of  Y (p)  fp  for  negative  real  values  of  p. 


Figure  12.  Final  step  in  transformation  of  net- 
works with  zero  transmission  at  zero  frequency. 

The  numerator  and  denominator  of  (16)  are 
of  equal  degree  and  therefore  contain  the  same 
number  of  linear  factors.  These  factors  may  be 
assigned  to  YA  or  to  YB  arbitrarily  except  that 
YA  and  YF  must  be  physically  realizable  driv- 
ing-point admittance  functions  which  behave 
ultimately  like  condensers  as  the  frequency  in- 
creases indefinitely;  that  is,  roots  and  poles 
must  alternate  and  there  must  be  a  simple  pole 
at  infinity. 

There  are  five  kinds  of  steps  which  may  be 
taken  to  transform  a  lattice  into  an  unbalanced 
form.  These  steps  are  based  upon  Bartlett's 
bisection  theorem,14  and  may  be  taken  in  any 
order  and  as  often  as  necessary.  Each  of  them 
will  now  be  described  as  it  would  be  applied 
directly  to  Figure  10.  In  the  following  diagrams 
a  lattice  enclosed  in  a  rectangle  means  an  un- 
balanced network  whose  configuration  may  not 
be  known  yet,  but  whose  lattice  prototype  is  as 
indicated. 

1.  Shunt  network  pulled  out  of  both  branches : 
shown  in  Figure  13. 

2.  Shunt  network  pulled  out  of  the  line  branch 
only:  shown  in  Figure  14. 


3.  Series  network  pulled  out  of  both  branches : 
shown  in  Figure  15.° 

4.  Series  network  pulled  out  of  the  lattice 
branch  only  :  shown  in  Figure  16.c 


Figure  lii.  Step  in  transiormauon  oi  lattice; 
shunt  networks  pulled  out  of  both  branches. 


Figure  14.  Step  in  transformation  of  lattice; 
shunt  network  pulled  out  of  line  branch  only. 


Figure  15.  Step  in  transformation  of  lattice; 
series  networks  pulled  out  of  both  branches. 


i 
■ 

i 

ft 


Figure  16.  Step  in  transformation  of  lattice; 
series  network  pulled  out  of  lattice  branch  only. 

*  Given  in  impedance  form. 
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5.  Breakdown  into  parallel  lattices:  a  fairly 
obvious  step  which  need  not  be  illustrated. 
As  an  example  of  (13)  consider 

I(P)  l+blP 
where  all  the  coefficients  are  positive.  Since 

y(p)  =  P£}  -f-  a0  -  Oil.  ~  °lbl  +  ff»)p 

there  is  no  problem  if  a,  >  (a,/^)  +  a^^  But  if 
Ox  <  (aj/6,)  +  a06x  we  have  the  problem  of  trans- 


v — 5 — 


Figure  17.   Illustrative  lattice  prototype. 

forming  the  lattice  in  Figure  17.  We  can  apply 
steps  2  and  4  immediately,  but  find  that  the 
residual  lattice  cannot  be  transformed  unless 
a,  >  {ajb,).  Under  this  additional  restriction 
we  can  apply  step  8  obtaining  finally  the  net- 
work shown  in  Figure  18. 

As  an  example  of  (15)  consider 


Taking  k  =  1  (the  smallest  value  which  may  be 
assigned) ,  we  get 


Yb  m       2p(3  +  16p) 


(1  +  2p)  (1  + 

One  way  of  choosing  YA  and  YB  is 

Y       (1  +  2p)  (1  +  16p) 
A  2(3  +  16p) 

This  leads  finally  to  the  network  shown  in  Fig- 
ure 19.  Such  a  simple  network  is  possible  of 


YB  =  p. 


course  because  F(p)  happens  to  satisfy  the  re- 
quirements of  a  physically  realizable  driving- 
point  admittance  function.  However,  another 
way  of  choosing  YA  and  YB  is 


YA 


l_±_2p     Y       p(3  -I-  16p) 
2  *  "    1  +  16p 


This  leads  to  the  network  shown  in  Figure  20. 


II 

Figure  18.  Unbalanced  equivalent  of  illustrative 
lattice  prototype  when  02/61  <oi<  (a2/6i)  +  006!. 


Ro=l2 


) 

— wv\a — 1| — 

0  =44    r  =  — 
1   5     c«  9 

Figure  ltf.  KC'  network  with  zero  transmission  at 
aero  frequency. 


C0=l  Ro=2 
-AAAAAr 


R0=2 
■AAAAAr  1 


R,=  3 
:C,=4 


Figure  20.  Another  /2C  network  with  zero  trans- 
mission at  zero  frequency. 
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rpHE  ILLUSTRATIVE  material  described  in  this 
J-  chapter  is  taken  from  four  practical  appli- 
cations. 

1.  Second-derivative  circuit  for  the  M9  anti- 
aircraft director. 

2.  Position  data  smoother  for  the  "close  sup- 
port plotting  board,"  with  delay  correction  for 
constant  velocity  aircraft. 

3.  Position  and  rate  circuit  for  the  "com- 
puter for  controlling  bombers  from  the 
ground,"  with  optional  delay  correction  of  posi- 
tion data  for  constant-velocity  aircraft. 

4.  Position  and  rate  circuit  using  electro- 
mechanical servomeeha.'Msms. 

The  design  and  analytical  procedure  used  in 
the  first  application  has  not  heretofore  been 
described  in  writing.  Hence,  considerably  more 
space  will  be  devoted  to  it  than  to  the  other 
three  applications.  The  latter  have  been  de- 
scribed in  detail  in  reports.1" 1;  13 


ls  1      SECOND-DERIVATIVE  CIRCUIT 
DESIGN 

,,  M    Realizable  Approximation  of  Best 
Transmission  Function 

The  best  transmission  function  for  the  sec- 
ond-derivative circuit  was  taken  to  be 

JVp)  =  p%(p)  , 

in  the  notation  of  Chapter  11.  This  assumes  fiat 
random  noise  in  position  data  and,  arbitrarily, 
1-second  smoothing  and  settling  time.  The 
series  expansion  of  y.,(p)  is,  according  to  ex- 
pressions (15)  of  Chapter  11, 

yf(p,-i  -Ip  +  ip..  JLp.  +  jl-p*...,. 

The  form  of  the  rational  approximation, 


yip)  = 


1  +  6,p  +  b2p*  +  b3p3  +  b<p4' 


was  chosen  for  simplicity  under  the  require- 
ment that  the  transmission  function  p*y(p) 


should  cut  off  at  the  rate  of  12  db  per  octave." 
This  requirement  was  set  as  a  precaution 
against  noise  due  to  granularity  of  the  coordi- 
nate-conversion potentiometers  in  the  director. 

Following  the  procedure  outlined  in  Section 
12.2  the  following  equations  were  obtained : 


!>i  —  2  =  0 


0 


b<  -\bi  +  lbt  -±  b1  +  1^ 


1  h  -  3  h  1 
2'    J      28'    1  ~  53 


84' 


whence 
Since 

p*  +  21pJ  +  189p*  -(-  882p  +  1764 
21  +  V21 


1 

1764 


-  ip»  + 


P  +  42) 


x  rp«  +  21  -y^p  +  42)  , 


2 

yAv)  would  have  two  conjugate  pairs  of  com- 
plex poles,  viz., 

p  =  -  6.40  ±  il.047,     -  4.10  ±  t6.02, 

of  which  one  pair  is  very  nearly  real. 

In  order  to  simplify  the  circuit  design,  how- 
ever, it  was  desirable  to  limit  the  number  of 
complex  poles  to  a  single  conjugate  pair.  This 
was  accomplished  by  leaving  b4  arbitrary  so 
that  the  denominator  of  y2(p)  was 

1  +  5p  +  kp,+  8lp,  +  bipt  • 
A  value  for  bt  which  would  make  this  expres- 
sion vanish  at  two  negative  real  values  of  p 
was  found  by  plotting 

176464  -  5  (*»  -  Ox*  +  42x  -  84) 


'  The  design  antedated  the  formulation  of  the  n  —  m 
=  r  +  1  rule  given  in  Section  12.2,  according  to  which 
the  best  transmission  function  should  have  been  taken 
as  p'y,(p)  in  the  notation  of  Chapter  11.  However,  no 
trouble  waa  experienced  in  obtaining  a  physically  real- 
izable approximation,  of  the  complexity  assumed. 
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against  x,  as  shown  in  Figure  1.  The  right- 
hand  member  is  positive  only  in  the  range 
x  >  3.77  and  has  a  maximum  of  0.982  at  about 
z  =  6.63. 


1.0 
08 

06 
04 


02 


1764  b4 

i 

XJl 

1.0  2.0  4.0      6.0    6.0  IO0 

Figure  1.    Graphical  determination  of  6«. 

In  order  to  obtain  a  substantial  separation 
between  the  two  real  poles  of  y2(p),  the  value 
17646,  =  0.5  was  chosen.  The  approximation 


V(P) 


1 


3528 


has  poles  at 

p  -  -  4.17391  ,    -  31.72813  ,    -  3.04898 
*  t 4.16463  . 
The  series  expansion  of  y.,  (p)  agrees  with  that 
of  Vt(p)  to  four  terms,  the  fifth  term  being 
37/7056  p*  instead  of  5/1008  p\  The  difference 
in  the  fifth  term  is  less  than  6  per  cent. 

The  realized  approximation  and  the  best 
weighting  function  are  shown  in  Figure  3. 

is.u  Transient  Responses 

The  responses  of  the  physical  network  whose 
transmission  function  is  p2y2(p)  are  compared 
to  those  of  the  best  network  whose  transmis- 
sion function  is  p2y2(p),  in  Figures  2,  3,  and  4. 
The  signals  for  which  (and  the  formulas  by 
which)  these  responses  were  computed  are 
tabulated  below. 

Response  formulas 
Realized  Best 
L~Hm(p)\      00/(1  -20(1  -/) 

L~l\Vdv)\  mu\-t)\* 


Figure 

Signal 

/  <0      I £0 

2 

0  1 

3 

0  t 

4 

o  >f 

V 


/'(10-  15/  +  6/1) 


It  has  been  noted  that  Figure  3  also  repre- 
sents the  best  and  the  realized  weighting  func- 
tions. 




mauko 

u 

u 

it 

_II»T 

 \ 

< 

h 

» 
• 

1  » 
\  t 

\  « 

0 

u 

V 

to 

\  1 
\  \ 

V* 



* 

t 

V  1M  M  V  HB  IM  Mm  1 

Figure  2.  Responses  to  step  function,  viz.,  E (t)  = 
1  when  t  >  0. 
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Figure  3.  Responses  to  linear  ramp  function,  vfz., 
E(t)  -  t  when  t  >  0;  second  derivative  smoothing 
functions. 


~0~ 

Figure  4.  Responses  to  parabolic  ramp  function, 
viz.,  E(t)  =  (%)£  when  t  >  0;  second  derivative 
settling  characteristics. 
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If  a  signal  of  the  form 


Eif)  =  at  +  a  J  +  -.,  (hfi 

were  to  be  applied  suddenly  to  the  second -de- 
rivative circuit  at  t  =  0  the  response  would  be 

r'-; !  (;)-•;•<  (?)+*.•<■(?) 

where  A,„  A,,  A  .  stand  for  the  responses  shown 
in  Figures  2,  3,  and  4,  respectively,  and  where  t 
is  the  time  in  seconds  and  T  is  the  nominal 
smoothing  time.  The  response  V(t)  is  the  indi- 
cated acceleration  of  the  target. 

The  sudden  application  of  the  instantaneous 
position  and  velocity  components  of  the  signal 
to  the  second-derivative  circuit  will  give  rise  to 
some  very  serious  consequences  unless  special 
measures  are  taken  to  mitigate  them.  To  see 
this  let  it  be  assumed  that  T  =  20  seconds  and 
that  the  target  is  at  such  a  range  that  a„  = 
20,000  yards  when  the  signal  E  (t)  is  applied 
to  the  second-derivative  circuit.  Each  unit  of 
A0  in  the  ordinate  scale  of  Figure  2  then  repre- 
sents an  indicated  acceleration  of  50  yd  per 
sec-.  Referring  to  Figure  2  it  is  clear  not  only 
that  the  effective  settling  time  will  be  several 
times  the  smoothing  time  but  also  that  the  indi- 
cated acceleration  will  go  through  exceedingly 
large  maxima. 

Exceedingly  large  transient  responses  are 
not  peculiar  to  second-derivative  circuits.  They 
occur  also  in  first-derivative  circuits  in  linear 
prediction,  where  they  are  due  entirely  to  the 
initial  position  term  in  the  signal.  In  all  cases 
they  are  reduced  to  harmless  proportions  by 
special  arrangements  of  the  circuits  during  the 
operation  of  slewing. 


tion  Ys  of  the  experimental  second-derivative 
circuit  design,  also  referred  to  a  nominal 
smoothing  time  of  1  second.  The  transmission 
function  of  the  linear  prediction  circuit  with 
10-second  smoothing  of  first  derivative  is  then 


:—  JTTT 

Table  1* 

»/ 

 .  - 

Yi 

Y, 

1 

0.174 

i 

0.666 

—0.454 

i 

0.165 

2 

0.651 

1.166 

—1.442 

1.212 

3 

1.312 

1.358 

— 2.014 

3  527 

4 

1.943 

1.203 

—1.069 

6.688 

5 

2.382 

0.821 

2.000 

9.409 

6 

2.599 

0.364 

6.575 

10.115 

7 

2.637 

-0.067 

10.893 

8.220 

8 

2.558 

—0.429 

13.468 

4.695 

0 

2.416 

—0.711 

14.096 

0.953 

10 

2.242 

—0.920 

13.401 

— 2.092 

11 

2.062 

—1.070 

12.064 

— 4.320 

12 

1.885 

—1.172 

10.530 

— 5.777 

13 

1.720 

-1.238 

9.027 

—6.704 

14 

1.566 

-1.279 

7.652 

-7.169 

15 

1.429 

-1.299 

6.438 

-7.398 

lb 

5.382 

-7.446 

17 

4.471 

-7.374 

18 

1.096 

-1.286 

3.683 

-7.221 

19 

1.004 

-1.268 

3.015 

-7.025 

20 

0.926 

-1.247 

2.436 

-6.795 

22 

0.790 

-1.198 

1.509 

-6.292 

24 

0.683 

-1.145 

0.818 

-5.780 

26 

0.593 

-1.091 

0.301 

-5.287 

28 

0.518 

-1.040 

0.088 

-4.828 

30 

0.457 

-0.380 

-4.402 

32 

0.407 

-0.945 

-0.599 

-4.016 

34 

0.364 

-0.902 

-0.762 

-3.666 

36 

0.326 

-0.862 

-0.881 

-3.348 

38 

0.296 

-0.825 

-0.967 

-3.062 

40 

0.266 

-0.790 

-1.026 

-2.800 

•  f  is  in 

c  when  smoothing  time  T  =  1 

sec.  For 

T-second  net- 

works.  values  of  9/  are  multiples  of  1/9T  e,  values  of  Yt  should 
bo  divided  by  T,  and  values  of  Yt  should  be  divided  by  T».  The 

lwo  networks  may  have  different  values  of  7*. 


13.1.3 


Effect  of  Tracking  Errors  on  while  that  of  the  quadratic  prediction  circuit 

Accuracy  of  Prediction  with  20-second  smoothing  of  second  derivative 

The  statistical  effect  of  tracking  errors  on  1S 


the  accuracy  of  prediction  is  most  readily  de- 
termined from  the  power  spectrum  of  the 
tracking  errors  and  the  transmission  function 
of  the  prediction  circuit. 

Table  1  gives  the  values  of  the  transmission 
function  F,  of  the  first-derivative  circuit  in  the 
M9  director,  referred  to  a  nominal  smoothing 
time  of  1  second,'1  and  the  transmission  func- 


>V/0 


(0.9- 


9494_        K.077  31  74 

1.6      V  +  2.4      /.  -r  :Ui 


27  01  \ 

v  +  ah) 


Y,(P)  -  JVp)  + 


r»(20p) 


i  G2  are  determined  in  accordance 
with  the  discussion  in  Section  A.10.  Since 


we  get 


)',(p)  =  p(l  -  0.3724p  + 
)-,<p)  =  p2(l  -•••) 

, 

0',  =  // 

ft  -  I  </  +  3.7241,  . 


) 
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Table  2  gives  the  values  of  \Yi(p)  |J  and  of 
\Yq(p)  \*  for  tt  =  5, 10, 15, 20  seconds.  These  are 
plotted  in  Figures  5,  6,  7,  and  8. 


of  the  total  power,  or  an  rms  error  of  15.8 
yards  out  of  17.9  yards. 

The  rms  error  of  prediction  is  the  square 
root  of  the  power  transmitted  by  the  prediction 
circuit.  This  is  tabulated  on  the  last  line  of 
Table  2  and  in  the  smaller  table  following. 


Figure  5.  Power  transmission  ratio  of  linear 
and  quadratic  prediction  circuits  with  5-second 
prediction  time. 

The  last  column  of  Table  2  and  Figure  9 
give  the  power  spectrum  of  a  composite  of  the 
range  and  transverse  errors  in  a  typical  run 

The  power  contained  in  the  frequency  range 
covered  by  the  table  accounts  for  78  per  cent 
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Figure  6.  Power  transmission  ratio  of  linear  and 
quadratic  prediction  circuits  with  10-second  pre- 
diction time. 


Table  2 


10 


90/ 

IFil* 

|Tff* 

!  Y,\*  I 

0 

1.00 

1.00 

1.00 

1.00 

1 

1.29 

1.13 

1.82 

1.60 

2 

2.10 

2.76 

4.08 

8.90 

3 

3.20 

6.85 

7.19 

26.73 

4 

4.2 

10.0 

10.1 

39.5 

5 

5.0 

10.5 

12.1 

39.9 

6 

5.3 

9.8 

13.1 

35.6 

7 

5.4 

8.8 

13.2 

30.8 

8 

5.2 

7.9 

12.8 

26.6 

9 

5.0 

7.1 

12.2 

23.0 

10 

4.7 

6.3 

11.4 

20.0 

11 

4.4 

5.7 

10.5 

17.5 

12 

4.1 

5.1 

9.7 

15.3 

13 

3.8 

4.6 

8.9 

13.5 

14 

3.6 

4.2 

8.2 

12.1 

16 

3.4 

3.8 

7.6 

10.6 

16 

3.2 

3.5 

7.0 

9.5 

17 

3.0 

3.2 

6.5 

8.5 

18 

2.8 

3.0 

0.0 

7.7 

19 

2.7 

2.8 

5.6 

7.0 

20 

2.5 

2.6 

5.3 

6.3 

rms 

error  of 

prediction 

23.9 

29.5 

33  9 

53.4 

15 

20 

IK.!* 

\Y,l* 

P*  Mk-vn 

1.00 

1.00 

1.00 

1.00 



31.4 

2.59 

2.71 

3.59 

4.81 

33.5 

6.97 

23.16 

10.74 

50.35 

35.7 

12.96 

72.51 

20.51 

159.43 

19.7 

18.6 

106.1 

29.76 

231.3 

3.6 

22.4 

104.4 

35.9 

223.9 

2.5 

24.3 

90.6 

38.9 

190.6 

1.2 

24.6 

76.6 

39.4 

158.4 

1.6 

23.8 

64.7 

38.2 

131.8 

2.1 

22.5 

55.0 

36.0 

110.6 

1.4 

21.0 

47.0 

33.5 

93.5 

0.7 

19.3 

40.4 

30.8 

79.6 

0.8 

17.7 

35.0 

28.3 

68.2 

0.8 

16.3 

30.4 

25.8 

58.9 

0.5 

14.9 

27.1 

23.6 

52.0 

0.3 

13.7 

23.4 

21.6 

44.5 

0.8 

12.6 

20.6 

19.8 

39.0 

1.1 

11.6 

18.3 

18.2 

34.4 

0.8 

10.7 

16.3 

16.8 

30.4 

0.4 
0.7 

9.9 

14.6 

15.5 

27.0 

9.2 

13.1 

14.4 

24.1 

1.0 

44.5 


85.4 


55.4  125.0 


•  P  U  in  uniu  of  180  yd"  per  c 
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Time  of  flight 
in  seconds 
5 
10 
15 
20 


Rms  error  of  prediction  due 
to  tracking  errors  in  yards 
Linear  Quadratic 


23.9 
33.9 
44.5 
55.4 


29.5 
53.4 
85.4 
125.0 


It  is  obviously  relatively  disadvantageous  to 
use  quadratic  prediction  when  the  target  is  in 
fact  flying  a  rectilinear  unaccelerated  course. 


Figure  7.  Power  transmission  ratio  of  linear 
and  quadratic  prediction  circuits  with  15-second 
prediction  time. 
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Figure  8.  Power  transmission  ratio  of  linear  and 
quadratic  prediction  circuits  with  20-second  pre- 
diction time. 

The  relative  advantage  of  linear  prediction 
should  persist  for  target  paths  with  only  a 
slight  amount  of  curvature,  but  this  relative 
advantage  should  decrease  as  the  curvature  is 
increased.  When  the  curvature  exceeds  a  cer- 
tain amount,  the  relative  advantage  should 
shift  to  quadratic  prediction. 
The  determination  of  the  minimum  value  of 


target  path  curvature  at  which  quadratic  pre- 
diction becomes  relatively  advantageous  de- 
pends not  only  upon: 

1.  dispersion  of  the  predicted  point  of  im- 
pact due  to  tracking  errors, 
but  also  upon  a  number  of  i 
which  are : 

2.  actual  future  position  of  target  with 
respect  to  the  predicted  point  of  impact,  assum- 
ing an  accurate  computer  and  the  absence  of  all 
sources  of  dispersion  enumerated  here  ;e 

3.  dispersion  due  to  inaccuracies  in  the  com- 
puter and  data-transmission  systems ; 

4.  dispersion  due  to  noise  in  the  computer 
and  data-transmission  systems ; 

5.  dispersion  due  to  variations  in  actual  dead 
time; 

6.  dispersion  due  to  gun  wear  and  to  varia- 
tions in  powder  charge,  shell  weight,  shell 
shape,  etc.; 
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Figure  9.   Composite  power  spectrum  of  tracking* 
errors  of  experimental  radar. 

7.  dispersion  due  to  variations  in  meteoro- 
logical conditions  along  the  path  of  the  shell ; 

8.  dispersion  due  to  variability  of  time-fuze 
calibration ;  and 

9.  lethal  pattern  of  shell  burst. 

In  a  special  illustrative  case,  a  numerical 
analysis,  including  most  of  these  factors  (esti- 
mated), showed  that  quadratic  prediction  be- 
comes relatively  advantageous  when  the  target 
acceleration  exceeds  about  O.lg.  However,  this 
should  not  be  taken  as  a  general  result. 


o  This  is  considered  in  detail  in  the  next  section. 
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1,1  *    Linear  and  Quadratic  Prediction 
Errors  on  Constant-Velocity 
Circular  Courses 

The  use  of  a  finite  number  of  derivatives  of 
the  tracking  data  for  purposes  of  prediction  is 
itself  a  source  of  prediction  errors  even  if  there 
were  no  tracking  errors.  Definite  evaluation  of 
these  prediction  errors  can  be  made  only  if  the 
path  of  the  target  is  prescribed.  The  simplest 
path  which  can  be  prescribed  for  this  purpose 
is  a  circular  one  at  constant  velocity.  Such  a 
path  is  fairly  realistic  when  considered  in  rela- 
tion to  the  difficulty  of  maneuvering  a  bomber 
and  to  actual  records  of  the  paths  of  hostile 
bombers  over  London  during  World  War  II. 

The  position  of  a  target  flying  in  a  circle  at 
constant  velocity,  referred  to  the  center  of  the 
circle,  is  expressed  by  the  complex  quantity 
Re**  where  R  is  the  radius  of  the  circle  and  « 
is  the  angular  rate.  In  terms  of  the  velocity  V 
and  the  transverse  acceleration  A,  we  have 
R  =  V*/A  w  =  A/V.  The  predicted  position  is 
then  at  JtT(i»)e'-'  where  Y(u.)  is  the  trans- 
mission function  of  the  prediction  circuit.  The 
true  future  position  of  the  target,  however,  is 
at  R  exp  [i«>(t  +  t,)  ].  Hence,  the  prediction 
error,  referred  to  axes  fixed  on  the  target  and 
oriented  respectively  transverse  to  and  in  the 
direction  of  the  present  velocity,  is 

«  ~  RlY(iu)  -  e"r] . 
As  an  illustration  let  us  consider  a  case  in 
which  V  =  150  yd  per  sec,  A  =  5  yd  per  sec1  and 
tf  =  10.  For  the  linear  prediction  circuit 

Yrffo)  -  1.0409  +  /0.3296 

and  for  the  quadratic  prediction  circuit 

r,(»«)  -  0.9501  +  t0.3610 

while 

-  0.9450  +  t0.3272  . 

Hence,  when  the  present  position  of  the  target 
is  at  4500  +  t'O  with  respect  to  the  center  of  the 
circle,  the  linear  predicted  point  is  at  4684  + 
tl483,  the  quadratic  predicted  point  is  at 
4276  -I-  t'1624  while  the  true  future  position  is 
at  4252  +  t'1472.  These  are  shown  in  Figure  10. 
The  prediction  error  vectors  are 

«,  =  432  +  /ll  jt|;  =  432 
«t  =    24  +  f  152     |«v  =  154 


Referring  to  Figure  10  it  may  be  observed 
that  if  the  first-derivative  component  of  the 
prediction  were  to  be  reduced  by  approximately 
10  per  cent  a  nearly  perfect  hit  would  be  ob- 
tained. This  suggests  the  possibility  of  deter- 
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Figure  10.  Vector  diagram  of  linear  and  quadratic 
prediction  for  constant-velocity  circular  courses. 

mining  empirical  functions  of  the  time  of  flight 
for  the  potentiometer  factors  G,  and  G,  in 
order  to  improve  the  probability  of  kill.  This 
would  involve  consideration  of  all  of  the 
sources  of  dispersion  enumerated  in  the  preced- 
ing section  as  well  as  a  statistical  study  of  tar- 
get paths.  Such  a  determination  has  not  been 
attempted. 

it. i s      Physical  Configuration  of  the 
Second-Derivative  Circuit 

In  this  section  we  shall  derive  a  physical  con- 
figuration for  the  second-derivative  circuit.  In 
particular  it  illustrates  the  application  of  feed- 
back to  the  realization  of  weighting  functions 
or  impulsive  admittances  involving  complex 
exponentials  in  general."  It  should  be  pointed 
out,  however,  that  the  application  of  feedback 
to  the  end  in  view  is  not  restricted  to  purely 

0  Originally  proposed  by  R.  L.  Dietzold. 
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electronic  circuits.  An  application  involving 
the  use  of  servomechanisms  will  be  described 
in  Section  13.4. 

The  transmission  function  which  concerns  us 
here  may  be  expressed  in  the  partially  factored 
form 


Y(P)  = 


((>  +  0.2087)  i/>  +  l..)S04)(/;-  +  0.3U4<»p  +  O.OttOli) 
where  the  |>oles  have  been  adjusted  to  cor- 
respond to  T  =  20  seconds  and  where  a  constant 
factor  has  been  left  out. 

The  circuit  is  to  be  designed  to  work  out  of 
the  amplifier  in  the  first-derivative  circuit  of 
the  M9  director.  Since  this  much  of  the  first- 
derivative  circuit  has  a  transmission  function 
of  the  form  p  (p-t-0.24),  the  transmission 
function  which  we  have  to  realize  is  Y  ,(p) / 
Y,(l>)  where 


and 


P  f  0.20S7'  ip  +  i..W»4i 


Y,ip) 


U.MWp  +  IMKttWi 
p  +  0.24 

The  inversion  of  the  factor  corresponding  to 
Y,(p)  is  in  accordance  with  the  fact  that  the 
transmission  gain  through  a  feedback  amplifier 
is  equal  to  the  loss  in  the  feedback  network, 
provided  the  feedback  is  very  large.  To  realize 
the  transmission  function  Y,(p)  /Y,(p)  it  is 
therefore  necessary  only  to  realize  the  trans- 
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Figure  11.  Physical  configuration  of  quadratic 
prediction  circuit  for  modified  M9  AA  director. 


mission  functions  Y{(p)  and  Y,(p)  individu- 
ally. The  corresponding  networks  are  shown  in 
Figure  11,  with  typical  element  values. 


The  input  network  has  four  elements, 
whereas  Y,  (p)  has  only  two  parameters.  Hence 
there  are  two  degrees  of  freedom  in  the  element 
values  of  this  network.  One  degree  of  freedom 
must  be  reserved  for  the  impedance  level;  the 
other  permits  some  latitude  in  the  relative 
values  of  the  resistances  and  stiffnesses. 

The  feedback  network  has  four  independent 
elements,  whereas  Y,(p)  has  three  parameters. 
Hence  there  is  only  one  degree  of  freedom  in 
the  element  values  of  this  network.  This  degree 
of  freedom  must  be  reserved  for  the  impedance 
level. 

There  is,  however,  one  degree  of  freedom  be- 
tween the  impedance  levels  of  the  two  net- 
works. This  follows  from  the  fact  that  the 
transmission  function  of  the  circuit  is  the  ratio 
of  the  transmission  functions  of  the  individual 
networks.  The  scale  factor  for  the  transmission 
function  of  the  circuit  is  readily  determined 
from  the  fact  that  the  transmission  function 
must  be  approximately  pRt,C„  at  small  values 
of  p. 
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In  this  application,  position  data  smoothing 
with  delay  correction  for  constant  rates  of 
change  in  position  was  required.  Assuming  flat 
random  noise  in  position  data,  and,  arbitrarily, 
1-second  smoothing  time,  the  best  transmission 
function  for  position  data  smoothing  without 
delay  correction  is  yu(v)  in  the  notation  of 
Section  11.3.  The  best  transmission  function 
for  the  first-derivative  circuit,  if  it  were  re- 
quired, is  pyx  (p) .  Hence,  the  best  transmission 
function  for  position  data  smoothing  with  full 
delay  correction  is 


=  »o(p)  +  g  P*l(p)  • 
This  corresponds  to  the  weighting  function 

Wi(t)  =  14,(0 

=  2(2-3/)    0  <  /  <  1  . 

The  series  expansion  for  Y,(p)  is,  by  (15) 
of  Chapter  11, 

P4 


Yi(p) 


PJ  +  £  _  JL-  + 

12  T  30      120  T 
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The  form  of  the  rational  approximation  was 
chosen  as 

'  W      1 .+  blP  +  62pl  +  b,p* 

in  order  to  obtain  a  loss  characteristic  which 
has  an  ultimate  slope  of  12  db  per  octave.*  This 
requirement  was  also  set  as  a  precaution 
against  noise  due  to  granularity  of  the  coordi- 
nate-conversion potentiometers.  The  coefficients 
are  determined  by 


13.3 


i 


fci  =  ai 

-n>  =  ° 

+  ™ 

30 


6, 


-V2b>  +  3ofel  -  lib  =  ° 


whence 


Y(p)  = 


1  +  Hf  +  If'  + 


1440 


This  may  be  expressed  in  the  form  Y(p) 
YAp)/Y,(p)  where 


1 


7<(p)  =  1  -(-  0.1053p 
„  ,  ,       1  +  0.3530p  +  0.0461 5p' 

w)  -  — 


The  circuit 
Figure  12. 


1  +  0.4583p 
ion  is  shown  below  in 
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Figure  12.  Physical  configuration  of  data-smooth- 
ing circuit  for  close  support  plotting  board. 


•  This  design  also  antedated  the  formulation  of  the 
n  —  m  =  r  +  1  rule  given  in  Section  12.2  according  to 
which  we  should  have  taken  Yi(p)  «  y,(p)  +  %  pyAp)  ■ 


CIRCUIT  FOR  GROUND-CONTROL 
BOMBING  COMPUTER 


In  this  application,  rate  smoothing  as  well  as 
position  smoothing  was  required.  In  addition, 
delay  correction  in  position,  for  constant  rate 
of  change,  was  to  be  available  but  optional,  and 
the  loss  characteristic  was  to  have  an  ultimate 
slope  of  12  db  per  octave,  or  more. 

In  accordance  with  the  n  —  m  =  r  +  1  rule, 
the  best  transmission  function  for  position  data 
is  y1  (p) ,  whereas  that  for  rate  is  pi/:  (p) .  A  num- 
ber of  designs  were  made  on  this  basis.  How- 
ever, from  the  point  of  view  of  network  econ- 
omy they  were  inferior  to  a  design  based  on 
j/2(p)  for  position  data.  The  use  of  2/2(p)  for 
position  data  is  not  consistent,  theoretically, 
with  the  use  of  pi/2(p)  for  rate,  but  the  practi- 
cal advantage  outweighs  the  theoretical  disad- 
vantage. 

The  rational  approximation  used  for  i/,(p) 
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Figure  13.  Physical  configuration  of  linear  pre- 
diction circuit  for  ground-control  bombing  com- 
puter. 

is  the  one  given  in  (6),  Section  12.2.  It  may 
be  expressed  as 


where 


YAP) 


Y,(P) 
Y»(p) 


1 


1  +  0.2153p 

1  +  0.2847p  +  0.03870p» 
1  +  0.135<Jp 

1 


1  +  0.135*)p 
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It  may  be  noted  that  a  redundant  factor  has 
been  introduced,  viz.,  1  +  0.1359p,  in  order  to 
secure  a  physically  realizable  Y,(v) .  The  coeffi- 
cient was  chosen  so  that  a  resistance  would  not 
be  required  in  the  shunt  branch  of  the  feedback 
network.  Referring  to  tin-  circuit  configura- 
tion in  Figure  13,  the  transmission  function  of 
the  input  network  is  Y,s(p),  that  of  the  feed- 
back network  is  Y,(p),  and  that  of  the  output 
network  at  the  top  is  Y,  ,(p) . 

The  output  impedance  of  the  amplifier  is  re- 
duced nearly  to  zero  by  virtue  of  shunt  feed- 
back.1"^ Hence,  the  rate  circuit,  as  shown  in 
Figure  13,  may  be  derived  from  the  amplifier 
output  through  a  simple  additional  network 
whose  transmission  function  is  pY,,(p)-  Two 
rate  outputs  are  provided  so  that  the  delay 
introduced  in  position  may  be  corrected  option- 
ally without  disturbing  scale  factors. 

CIRCUIT  USING  SERVOMECHAN1SMS 
In  the  final  report,  October  25,  1945,  to 
NDRC  Division  7,  on  the  research  program  car- 
ried on  under  Contract  NDCrc-178,  a  list  is 
given  of  a  number  of  the  more  important  prac- 
tical advantages  for  the  use  of  a-c  carrier  in 
computing  circuits.  These  advantages  are: 

1.  Permits  operation  at  lower  levels  before 
running  into  trouble  with  thermal  noise,  contact 
potentials,  drifts  due  to  temperature; 

2.  Permits  use  of  transformers  for  imped- 
ance matching,  voltage  transformations,  cou- 
pling between  balanced  and  unbalanced  circuits ; 

3.  Permits  use  of  hybrid  coils  for  voltage 
summations  of  moderate  precision ; 

4.  Eliminates  the  necessity  for  modulators  in 
servo  circuits  using  a-c  motors ; 

5.  Permits  reduction  in  total  power  consump- 
tion, rectified  power  for  amplifiers,  and  voltage 
regulation. 

However,  the  techniques  of  differentiation 
and  of  data  smoothing  with  fixed  networks  in 
computing  circuits  which  use  d-c  carrier,  are 
not  applicable  to  computing  circuits  which  use 
a-c  carrier. 

The  circuit  described  here  is  an  example  of 
one  of  the  techniques  used  in  the  T15-E1  experi- 
mental curved  flight  director.'  In  Figure  14 
servo  motors'  are  indicated  by  A/,  and  genera- 

'  The  technique  of  using  servo  motor*  for  smoothing, 
as  described  above,  is  due  chiefly  to  h  L.  Norton. 


tors  by  G.  The  motors  are  two-phase  induction 
motors  with  one  phase  winding  of  each  ener- 
gized directly  by  the  carrier  source  at  constant 
amplitude.  The  generators  are  essentially  two- 
phase  induction  motors  also  with  one  phase 
winding  of  each  energized  directly  by  the  carrier 
source  at  constant  amplitude.  They  deliver,  at 


Figure 
circuit. 
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the  other  phase  windings,  carrier  voltage  at 
amplitudes  proportional  to  the  angular  velocities 
0,  and  0,  of  the  shafts.  The  potentiometers  are 
energized  by  the  carrier  source  at  constant  am- 
plitude. They  deliver  carrier  voltage  at  ampli- 
tudes proportional  to  the  angular  positions  0, 
and  6.2  of  the  shafts  from  some  reference  posi- 
tions. The  position  data  are  represented  by  the 
modulation  amplitude  E. 

With  amplifiers  of  sufficiently  large  voltage 
gain  and  power  capacity,  and  motors  of  suffi- 
ciently large  torque,  the  operational  equations 
of  the  circuit  are  readily  found  by  equating  to 
zero  the  sum  of  the  voltages  applied  to  each 
amplifier.  Thus 


0i  +  (a,  +  0p)0,  =  E 
p0i  -  (1  +  a2p)0,  =  0 


whence 


0i  = 


u2  = 


1  +  atp 


l  +       +  a„)p  -(-  0pJ 


E 


1  -Mat  +  «s)p  +  /3pJ 


The  angular  position  0l  therefore  represents 
the  smoothed  position  data  while  the  angular 
position  62  represents  the  smoothed  rate. 
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The  past  discussion  has  been  more  or  less 
clearly  directed  at  predictor  systems  hav- 
ing certain  well-defined  properties.  For  ex- 
ample, it  has  been  tacitly  assumed  that  the  first 
part  of  the  prediction  system  will  consist  of 
geometrical  manipulations  transforming  the 
raw  input  data  into  other  quantities,  such  as 
the  components  of  velocity  in  Cartesian  or  in- 
trinsic coordinates,  which  we  have  some  physi- 
cal reason  to  believe  should  be  approximately 
constant  for  extended  periods."  These  quanti- 
ties, then,  are  isolated  explicitly  in  the  circuit 
and  are  the  actual  effective  inputs  of  the  data- 
smoothing  networks.  The  data-smoothing  net- 
works themselves  are,  of  course,  definitely 
assumed  to  be  linear  and  invariable. 

This  is  obviously  a  straightforward  attack 
but  it  does  not  necessarily  exhaust  all  possibili- 
ties. For  example,  advantages  may  be  gained 
by  using  data-smoothing  networks  which  are 
nonlinear  or  which  vary  with  time  or  target 
position.  It  may  also  be  possible  to  smooth  the 
input  data  according  to  some  geometric  as- 
sumption, such  as  straight  line  flight,  without 
the  necessity  of  isolating  geometrical  parame- 
ters explicitly. 

This  chapter  attempts  to  illustrate  these  pos- 
sibilities by  some  rather  scattered  examples. 
Data-smoothing  networks  which  vary  with  time 
seem  to  give  improved  performance  over  fixed 
networks,  and  have  been  studied  with  some 
care.  Several  examples  are  given  at  the  end  of 
the  chapter.  None  of  the  other  lines,  however, 
has  been  explored  at  all  thoroughly.  The  ex- 
amples of  data-smoothing  networks  variable 
with  time  are,  in  a  sense,  illustrations  of  non- 
linearity  also,  since  they  all  operate  on  the 
assumption  that  the  cycle  of  the  network's 
variation  with  time  begins  anew  at  each 
marked  change  in  course.  Since  a  change  in 
course  is  exactly  like  a  tracking  error,  except 
that  it  is  much  larger,  this  resetting  requires 
a  nonlinear  control  circuit  which  respond 
to  large  amplitude  effects  but  not  to"small  ones. 


1  This  is  true  ideally  even  in  the  Wiener  system  since 
Wiener  assumes  that  transformations  will  be  made  to 
some  suitable  coordinate  system,  preferably  the  intrin- 
sic, before  the  statistical  prediction  method  is  applied. 


This,  however,  is  evidently  a  very  mild  sort  of 
nonlinearity.  More  thoroughgoing  nonlineari- 
ties  have  not  been  studied.  There  seems  to  be 
no  a  priori  reason  for  supposing  that  they 
would  appreciably  improve  the  performance 
of  data-smoothing  networks. 

The  first  part  of  the  chapter  gives  examples 
of  data-smoothing  schemes  which  do  not  re- 
quire the  isolation  of  geometrical  parameters. 
They  are  based  on  degenerative  feedback  cir- 
cuits which  satisfy  the  requisite  formal  rela- 
tions but  which  might,  in  some  cases,  be  un- 
stable in  practice.  This  portion  of  the  material 
is  included  primarily  for  its  possible  sugges- 
tive value  rather  than  for  its  concrete  practical 
usefulness. 

>*•'        THE  PROTOTYPE  FEEDBACK 
CIRCUIT 

The  diversity  of  particular  circuits  can  be 
givon  a  certain  unity  by  regarding  them  all  as 
modifications  of  the  feedback  smoothing  cir- 
cuit shown  originally  in  Figure  2  of  Chapter 
10.  In  accordance  with  the  discussion  of  that 
figure  it  will  be  convenient  to  suppose  that  the 
resistive  feedback  path  is  introduced  to  limit 
the  gain  of  the  amplifier  proper,  so  that  the 
structure  reduces  to  an  amplifier  with  high  but 
finite  gain  and  a  pure  capacity  feedback.  The 
circuit  has  a  net  loop  gain,  and  is  consequently 
degenerative,  at  any  moderately  high  frequency. 
For  our  present  purposes,  it  is  convenient  to 
recall  the  general  property  of  degenerative 
feedback  amplifiers,  that  they  tend  to  suppress 
any  given  frequency  by  the  amount  of  the  de- 
generative feedback  for  that  frequency.  This 
suppression  obtains  not  only  at  the  amplifier 
output  but  at  many  other  points  in  the  circuit 
as  well.  For  example,  it  holds  at  the  amplifier 
input  if  we  combine  the  original  applied  volt- 
age with  the  voltage  contributed  by  the  feed- 
back1- circuit1**  Thus,  except  for  the  absolute 


b  This  follows  immediately  from  the  fact  that,  since 
the  characteristics  of  the  amplifier  proper  are  not 
changed  by  the  addition  of  the  feedback  path,  the 
output  voltage  is  always  a  fixed  multiple  of  the  net 
input  voltage. 
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signal  level,  it  is  not  necessary  to  transmit 
through  the  amplifier  of  Figure  2  of  Chapter 
10  in  order  to  produce  the  smoothing  effect.  It 
would  be  sufficient  to  hang  the  input  circuit  of 
the  amplifier,  as  a  two-terminal  impedance, 
across  the  circuit. 

142     SIMULTANEOUS  SMOOTHING  IN 
THREE  COORDINATES 

The  property  of  degenerative  feedback  cir- 
cuits which  has  just  been  described  is  con- 
veniently illustrated  by  a  three-dimensional  ex- 
tension of  the  original  smoothing  circuit  of 
Figure  2  cf  Chapter  10.  The  three-dimensional 
circuit  is  shown  in  Figure  1.  The  three  input 
voltages  are  the  quantities  D,  DE,  and  DA  cos 
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Figure  1.    Feedback  smoothing  in  three  coordinates 


E,  where  D,  E,  and  A  are,  respectively,  slant 
range,  elevation,  and  azimuth.  The  three  volt- 
ages will  be  recognized  as  the  three  components 
of  the  target  motion  in  a  tilted  and  rotating 
rectangular  coordinate  system.  One  axis  of  the 
tilted  system  is  directed  along  the  instan- 


taneous line  of  sight  to  the  target  and  the  other 
two  are  perpendicular  to  this  one  in  the  ver- 
tical and  horizontal  planes  respectively.0  It  is 
assumed  that  these  input  rates  represent  target 
motion  in  a  straight  line,  plus  the  usual  track- 
ing errors.  The  object  of  the  smoothing  system 
is  to  provide  shunt  impedances  which  will  tend 
to  suppress  the  tracking  errors  by  feedback 
action,  according  to  the  principles  described  in 
the  preceding  section,  without  disturbing  the 
portions  of  the  input  voltages  corresponding  to 
the  assumed  straight  line  path. 

We  can  simplify  the  analysis  by  restricting 
our  attention  to  the  special  case  of  two-dimen- 
sional motion  which  occurs  when  the  target 
course  lies  in  a  vertical  plane  passing  directly 
through  the  antiaircraft  position.  This  is  illus- 
trated in  Figure  2.  In  this  case  the  component 
DA  cos  E  is  evidently  zero.  If  we  represent 
the  voltage  at  the  other  two  terminals,  includ- 
ing both  the  original  applied  voltages  and  the 
voltages  fed  back  through  the  circuit,  by  V,  and 
Vv  the  voltages  coming  out  of  the  coordinate 
converter  on  the  right-hand  side  in  Figure  2 
are 


v,  «  Vi  cos  E  -Vt  sin  E 
vw  -  Vt  cos  E  +  Vx  sin  E 


(1) 


These  voltages  are  differentiated,  passed 
through  a  second  coordinate  converter,  and  fed 
back  so  that  the  output  voltages  must  satisfy 


(2) 


Vi  =  D  —       cos  E  +  it  sin  E) 
V,  =  DE  -       cos  E  -  v,  sin  E)  . 


In  order  to  exhibit  the  smoothing  action  of 
the  circuit  let  us  denote  the  observed  velocity 
components,  referred  to  the  upright  and  fixed 


0  This  is  the  coordinate  system  which  was  used  in  the 
experimental  T15  director.  A  complete  prediction  cir- 
cuit can  be  obtained  by  using-  the  three  voltages  de- 
scribed here  as  inputs  to  the  lead  servos  in  the  TIB 
system.  In  the  actual  T16  system,  rates  in  the  tilted 
and  rotating  coordinate  system  were  obtained  by  the 
so-called  "memory  point"  method.  The  voltages  D,  DE, 
-etc.,  required  with  the  present  method,  might  be  ob- 
tained with  the  help  of  tachometers  attached  to  the 
tracking  shafts  to  measure  the  instantaneous  values  of 
D,  E,  and  A.  An  equivalent  to  the  variable  smoothing 
of  the  memory  point  method  can  be  obtained  by  *«»n«f 
the  gains  in  the  feedback  paths  in  Figure  1  variable 
according  to  the  principles  described  in  a  later 
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rectangular  coordinate  system,  by  ut  and  uw, 
so  that 


ut  =  D  cos  E  -  DE  sin  E 

u„  =  DE  cos  E  +  D  sin  E  . 
Substituting  (2)  and  (3)  into  (1),  we  get 


(3) 


Vy 


Uy     —  fiVy 


or 


Ml'*  +  = 

HVy    +   Vy    =    Uy  . 

These  show  clearly  that  vx  and  v„  are  smoothed 
values  of  u„  and  uy,  respectively.  If  n  is  constant 
the  smoothing  is  of  fixed  exponential  type.  If  ^ 
is  proportional  to  the  time  up  to  some  maxi- 
mum value,  the  smoothing  is  of  the  variable 
type  described  in  Sections  14.6  and  14.7. 

To  complete  the  discussion  of  the  circuit  we 
observe  that  by  (1) 

Vi  —  rx  cos  E  +  vy  sin  E 
Vt  =  Vy  cos  E  —  r«  sin  E  . 

These  show  that  Vx  and  V,  are  the  smoothed 
rate  components  referred  to  the  tilted  and 
rotating  rectangular  coordinate  system.  The 
fact  that  the  orientation  of  this  coordinate  sys- 
tem, which  depends  upon  the  observed  angular 
height  E,  is  not  smoothed  makes  no  difference 
to  the  computation  of  the  leads  because  this 
computation  is  made  instantaneously  in  the 
same  coordinate  system  to  which  the  smoothed 
rate  components  are  instantaneously  referred. 

The  analysis  in  the  general  case  including 
all  three  coordinates  is  of  the  same  nature. 
Since  the  rate  components  in  fixed  rectangular 
coordinates  appear  in  the  middle  of  the  feed- 
back path,  it  is  perhaps  not  fair  to  regard  the 
circuit  as  an  illustration  of  a  data-smoothing 
device  which  does  not  rely  upon  the  explicit 
isolation  of  the  geometrical  parameters  of  the 
assumed  target  path.  It  should  be  pointed  out, 
however,  that  in  comparison  with  a  straight- 
forward geometrical  solution  in  which  velocity 
components  in  fixed  coordinates  are  first  isolated 
explicity,  then  smoothed,  and  then  used  to  form 
the  basis  of  prediction,  the  circuit  in  Figure  1 
has  the  advantage  that  most  of  the  components 
can  be  built  with  very  low  precision.  What  is 
transmitted  around  the  feedback  loop  is  essen- 


tially the  tracking  errors  only.  Since  tracking 
errors  are  always  small,  very  high  percentage 
errors  in  the  system  can  be  tolerated.* 
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Figure  2.    Feedback  smoothing  in  two  coordinates. 

SMOOTHING  NETWORKS  VARIABLE 
WITH  TARGET  POSITION 

It  was  mentioned  earlier  that  changing  the 
data-smoothing  network  with  the  target  coor- 
dinates represented  one  way  in  which  the  re- 
sults obtained  from  fixed  networks  could  be 

d  An  exception  to  this  statement  must  be  made  for 
errors  in  the  coordinate  converters  which  fluctuate 
rapidly  with  target  position. 
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generalized.  In  a  sense,  the  coordinate  conver- 
sions of  Figure  1  are  illustrations  of  these 
possibilities.  A  better  illustration,  howe.dr,  is 
provided  by  the  circuit  of  Figure  3.  Thv  struc- 


Figure  3     Feedback  smoothing  with  smoothing 

variable  v.  ;  h  pv^iioti  coordinates. 

ture  is  intends  to  give  smooth  slant  range 
rate  from  slant  range  lata,  under  the  assump- 
tion of  unacceierated  straight  line  target 
motion. 

The  relation  between  input  and  output  in 
Figure  3  is  readily  seen  to  be  • 

'"at"  -4  '»'•>] 

or 

M^(/)IJ  +  1=^  (4) 

where  ^  is  the  amplifier  gain,  D  is  slant  range, 
and  V  =  dD/dt  is  slant  range  rate. 

The  principle  of  the  circuit  depends  upon  the 
fact  that  under  the  assumed  target  motion  the 
square  of  the  slant  range,  D2,  should  be  a 
quadratic  function  of  time,  so  that  [D  (dD/dt)] 
should  be  a  linear  function  of  time  and  (d/dt) 
[D  (dD/dt)]  should  be  a  constant.  This  last  is 
the  quantity  which  is  fed  back  in  Figure  3. 
If  it  actually  is  a  constant,  it  has  no  further 
influence  on  the  calculation,  since  the  forward 
circuit  includes  a  differentiator,  and  the  opera- 
tion of  the  circuit  is  the  same  as  though  no 
feedback  term  were  present.  This  can  be  verified 
by  setting  D  =  D0  =  \/a  +  2bt  +  ct\  corre- 
sponding to  ideal  straight  line  flight,  in  equa- 
tion (4).  It  is  readily  seen  that  the  equation  is 
satisfied  by 

ft  +  <*  dl)0 


V  =  To  = 


Va  +  2bl  -r  Ct* 


(It 


the  first  or  feedback  term  being  zero. 


If  D  does  not  correspond  exactly  to  straight 
line  Alight,  either  because  of  tracking  errors 
or  actual  target  maneuvers,  on  the  other  hand, 
the  feedback  voltage  is  no  longer  constant.  In 
this  case  transmission  around  the  loop  can 
exist  and  the  degenerative  feedback  action 
produces  smoothing  in  both  the  input  and  the 
output  voltage.  In  calculating  the  exact  effect 
we  must  take  account  of  the  fact  that  the  feed- 
back voltage  depends  upon  the  D  potentiometer 
in  the  feedback  circuit  as  well  as  upon  the  out- 
put voltage  V.  Since  the  D  potentiometer  set- 
ting must  include  the  errors  in  the  input  data, 
this  means  that  the  output  voltage  is  not  per- 
fectly smoothed,  even  with  unlimited  gain 
around  the  loop.  The  percentage  error  in  the 
output  rate  tends  in  the  limit  to  approximate 
the  percentage  error  in  D  itself.  For  practical 
purposes,  however,  this  is  a  very  satisfactory 
result,  since  in  the  absence  of  smoothing  per- 
centage errors  in  rates  are  usually  many  times 
those  of  the  corresponding  coordinates. 

It  is  apparent  that  it  should  be  possible  to 
construct  many  circuits  of  this  general  type 
from  the  differential  equations  of  the  trajec- 
tory. A  second  example  is  furnished  by  Figure 
4.  The  operation  of  the  circuit  is  essentially 

•  •  DAcosE 

_ 


•The  condensers  in  Figure  3  symbolize  differentia- 
tion. 


Figure  4.  Another  example  of  feedback  smooth- 
ing with  smoothing  variable  with  position  coordi- 
nates. 

similar  to  that  of  Figure  3.  It  depends  upon 
the  fact  that  in  unaccelerated  straight  line 
motion  the  quantity  D2A  cos2  £  is  a  constant. 
Instead  of  multiplying  by  D2  and  cos2  £  at  a 
single  point  in  the  feedback  loop,  however, 
separate  multiplications  by  D  and  cos  E  are 
introduced  in  the  forward  and  feedback  cir- 
cuits. This  permits  the  output  to  appear  as  a 
smoothed  value  of  the  quantity  DA  cos  E, 
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which  will  be  recalled  as  one  of  the  primary 
quantities  in  the  circuit  of  Figure  1. 

14-«  NETWORKS  VARIABLE  WITH  TIME 

In  addition  to  making  the  parameters  of  the 
data-smoothing  network  vary  as  functions  of 
the  coordinates  of  target  position  we  may  also 
make  them  variable  as  functions  of  time.  The 
advantage  of  variation  with  time  can  be  under- 
stood by  going  back  to  the  discussion  of  the 
analytic  arc  assumption  and  its  consequences 
for  fixed  data-smoothing  networks,  as  given  in 
Chapters  9,  10,  and  11.  It  will  be  recalled  that 
for  any  given  settling  time  there  was  an  opti- 
mum choice  of  the  network's  weighting  func- 
tion. The  choice  of  the  settling  time  itself,  how- 
ever, was  always  a  compromise.  On  the  one 
hand,  making  the  settling  time  too  short  led 
to  too  little  smoothing,  so  that  the  dispersion 
in  the  resulting  fire  became  excessive.  On  the 
other  hand,  too  long  a  settling  time  meant  that 
data  from  previous  unrelated  segments  were 
retained  in  the  smoothing  circuit  during  too 
large  a  proportion  of  an  average  individual  seg- 
ment of  the  target  path,  leaving  too  small  a 
residue  of  the  average  segment  as  useful  firing 
time. 

It  is  evident  that  it  is  theoretically  possible 
to  escape  the  consequences  of  this  compromise 
by  resorting  to  variable  structures.  We  need 
merely  assume  that  the  network  always  has  a 
weighting  function  appropriate  for  a  settling 
time  equal  to  the  time  since  the  last  change  in 
course.  This  would  give  a  small  amount  of 
smoothing  shortly  after  a  change  in  course, 
with  more  smoothing  and  consequently  greater 
accuracy  later  on.  No  firing  time,  however,  is 
sacrificed  waiting  for  the  network  to  settle. 

In  order  to  exploit  these  possibilities  we 
must,  of  course,  be  able  to  design  networks  to 
give  at  least  approximately  the  right  sequence 
of  weighting  function.  It  is  also  necessary  to 
provide  some  sort  of  auxiliary  controlling 
mechanism  which  will  sense  changes  in  target 
course  and  return  the  variable  circuits  in  the 
smoothing  network  proper  to  their  initial  posi- 
tions. These  are  both  difficult  problems  which 
.iave  been  incompletely  explored.  Some  elemen- 
tary solutions,  based  principally  upon  modifica- 
tions of  the  degenerative  feedback  smoothing 


circuit  of  Figure  2,  of  Chapter  10,  are,  how- 
ever, given  later  in  the  chapter.  As  a  prelimi- 
nary, the  next  section  gives  a  formal  extension 
of  the  general  polynomial  expansion  method  of 
Chapter  11  to  the  variable  case. 


»*s  GENERAL  POLYNOMIAL  SOLUTION 
FOR  VARIABLE  NETWORKS 

The  extension  of  the  general  method  of 
Chapter  11  to  the  variable  case  requires  two 
modifications. 

1.  The  lower  limit  of  the  integral  to  be 
minimized  is  now  taken  as  zero,  in  anticipation 
of  the  possibility  of  discriminating  between  rele- 
vant and  irrelevant  data  on  the  basis  of  time  of 
arrival. 

2.  The  weighting  function  may  now  depend 
more  generally  upon  the  variable  of  integration 
and  the  upper  limit  of  integration. 

With  these  modifications  there  is  no  longer 
any  advantage,  in  conducting  the  analysis  in 
terms  of  the  age  variable  t.  To  deal  directly 
with  the  minimization  of  the  integral 

jf  \E(\)  -  ig(X)}«  B'o(/,X)  rfX  ,  (5) 

let 

E(\)  =  Vo  +  Vi-  G,«,X)  +  •  •  •  +  Vm  •  Gn(t,\),  (6) 

Where  Gm(t,k)  is  an  mth  degree  polynomial  in 
A.  Also,  let 

£  w0(t,\)  d\  =  i 

jf  G,(/,X)  ■  Gm(t,\)  ■  W0(t,\)  d\  =  0      if  I  *  m 

"  T.  in  =  m 

(Go  =  1,  Ar0  =  1)  . 

Then  (5)  is  a  minimum  with  respect  to  the 
Vm's  in  (6)  if 

Vm(t)  =J^lE(\)-Wm(t,\)d\  (7) 

where 

Wm(i,\)  =  kmGm(t,\)  •  W0(t,\)  .  (8) 

The  possibility  of  physically  realizing  the 
Vm(t)  depends  upon  the  possibility  of  realizing 
networks  with  impulsive  admittances  Wm(t^) 
in  the  sense  that  Wm{t,k)  is  the  response  of  a 
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network,  at  time  t,  to  a  unit  impulse  applied  at 
time  A,  where  0  <  A  <  t.  Taking  this  possibility 
for  granted,  the  predicted  value  E(t  +  t,)  is, 
according  to  (6),  a  variable  linear 
of  the  Vm{t),  viz., 

Kit  +  t/) 

(9) 


Wit)  +  d(M  +  ii)  ■  Vv(i)  +  ■ 

+  Gn(t,t  +  y  •  v.(t). 


It  is  clear  that  all  of  the  Wm(t,\)  as  well  as 
all  of  the  Gm(t,\)  for  m  =  1,  2,  .  .  .  are  deter- 
mined by  W0(t,\).  The  latter  is  determined  as 
the  best  weighting  function  for  position  data 
smoothing,  depending  upon  the  characteristics 
of  the  noise  associated  with  the  position  data. 
The  general  methods  of  determining  the  best 
weighting  function  with  fixed  smoothing  time, 
described  in  Chapter  10,  may  be  used  to  deter- 
mine the  best  weighting  function  with  variable 
smoothing  time. 

Under  the  assumption  that  the  spectrum  of 
the  noise  associated  with  the  signal  5(0  has  a 
uniform  slope  of  6k  do  per  octave,  we  may  take 
over  from  Section  11.3  the  result  that  the  best 
weighting  function  is 

-«-JW![i(l<-W  (,0) 

0  £  X  £  I . 
The  response  of  the  network  is  then 


£ 


S(X)  •  wk{t,\)  rfX 


(ID 


SPECIAL 


It  will  be  illuminating  to  consider  a  few 
special  cases  of  (11). 
For  k  =  0,  we  have 


V(D  =  |  jfs(X)dX. 


(12) 


Multiplying  through  by  t  and  differentiating 
we  get 

tV(t)  +  V(t)  =  5(0  .  (13) 

This  suggests  the  circuit  shown  in  Figure  5.f 
For  k  =  1,  we  have 


V(t) 


t*  Jo 


S(X)  •  \(t  -  X)  rfX  . 


Multiplying  through  by  t3  and  differentiating 
twice  we  get 

Irv  +  IV  +  V  =  S 
which  may  be  written  in  the  form 

This  suggests  the  network  shown  in  Figure  6.« 


14.7 


NETWORKS  WITH  A  LIMITED 
RANGE  OF  VARIATION 


By  generalizing  the  above  results  in  various 
ways  a  large  number  of  other  examples  of 
variable  smoothing  networks  can  be  constructed. 
Since  unlimited  variation  in  the  smoothing 
time  is  not  practically  possible,  or  perhaps  even 
tactically  optimal,  however,  it  is  desirable  in 
discussing  any  further  examples  to  include  also 
the  possibility  that  the  range  of  variation  in 
the  network  may  be  restricted.  For  any  posi- 
tive integral  value  of  k  in  (11)  the  differential 
equation  for  V(t)  is  of  the  type  which  may  be 
reduced  by  the  transformation  t  =  e*  to  a  linear 
differential  equation  with  constant  coefficients.11 
In  general,  this  facilitates  the  determination  of 
what  happens  to  the  weighting  function 
wk(t,A)  when  t  >  T  if  the  variability  of  the 
network  is  stopped  at  time  T.  In  the  case  of  the 
first-order  equation  (13),  however,  it  is  just 
as  easy  to  deal  directly  in  terms  of  the  natural 
time. 

A  more  general  form  for  (13),  which  readily 
yields  the  effects  of  a  sudden  or  gradual  stop- 
page of  the  variability  of  the  network,  is 


«(0 


V(t)  +  V(t)  =  5(0 


(14) 


This  corresponds  to  the  response 
whence  the  weighting  function  is 


w(t,\)  = 


»(X) 

*(0 


(15) 


'  This  circuit  is  due  to  S.  Darlington. 


«  Due  to  B.  T.  Weber. 

"See  Section  A.ll  for  a  more,  general  transforma- 
tion. 
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The  general  relation  (14)  may  be  realized 
with  the  network  of  Figure  5,  by  varying  the 
resistance  in  accordance  with 


R  m  1<K0 


t  >  0  . 


However,  a  more  practical  circuit  results  from 
the  introduction  of  variable  potentiometers'  in 
both  the  capacity  and  resistance  paths  of  the 


C=4=  V(t) 


Figure  5.   Time-variable  smoothing  circuit  giving 
uniform  weighting  function. 

original  feedback  smoothing  circuit  of  Figure 
2,  Chapter  10.  This  is  shown  in  Figure  7.'  It 
may  be  noted  that  the  feedback  circuit  is  also 
applicable  to  the  two  cases  discussed  in  the 
preceding  section.  It  has  the  advantage  for 
these  applications  that  it  does  not  require  the 
zero-impedance  generators  and  infinite-imped- 
ance loads  of  Figures  5  and  6. 


This  example  obviously  calls  for  a  linear  poten- 
tiometer in  the  condenser  path  and  a  switch  in 
the  resistance  path.  The  weighting  function  ob- 
tained is,  by  (15), 


u>(*,"X)  -  -    0  <  \  <  t  <  T 


j,  e-^/r  o  <  X  <  T  <  t 
1  e-«-wr   0  <  T  <  X  <  t 


Figure  7.  Limited  range  time-variable  feedback 
smoothing  circuit. 


S(1)A  C, 


D  ,J_ 


C,=J=  V(t) 

I 


Figure  €.  Time-variable  smoothing  circuit  giv- 
ing parabolic  weighting  function. 

As  an  example  of  (14)  we  may  take 

*(0  =  t   0  <  t  <  T 
=  re"-™    t  >  T  . 

Then 

J(0   =/  0<t<T 
=  T   t  >  T  . 
Hence,  in  Figure  7,  if  RC  =  T 

fc(t)  =  j,     fa(t)  =0    0  <t  <  T 
=  1  =  1    t  >  T  . 


1  In  aome  cases  a  variable  potentiometer  may  turn 
out  to  be  a  switch. 

J  This  circuit  is  due  to  S.  Darlington. 


This  is  illustrated  in  Figure  8  for  T=  10,  t  =  5, 
10,  20. 


0.2 


0.1 


t  =  5 
t  =  IO 

T=I0 

t=20 

10            15  20 

Figure  8.  First  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 


A  second  example  is  furnished  by  taking 

<t>(t)  =  ik    0  <  t  <  T 
=  7*e*«-T>/T  t  >  T . 


Then 


ko 


k  0  <  1  <  T 

T 
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Hence  in  Figure  7,  if  RC      T  k. 


The  weighting  function  obtained  is,  by  (15), 


frit)  =   T  fud)  =  1      lk     (i  <  i    .  T 


=  1 


1    i  >  T 


wCt,\)  = 


2T 


The  first  example  is  a  special  case  of  this  one. 
The  weighting  function  obtained  is,  by  (15), 

AX*-1 

u»(/,x)  =  — -j—    o  <  x  <  /  <  r 

■       c  -*('-r)/r  o  <  x  <  t  <  / 

=  ^  e  -*('-M/r    o  <  T  <  X  <  /  . 

This  is  illustrated  in  Figure  9  for  k  -  3/2, 
71  -  10,  t  =  5,  10,  20. 


0  <  X  <  *  <  7 


271 
2 


7  xV  e"2l'"T)  T     0  <  x  <  T  <  1 

V  ~2f) 

e-2(i-y)/T      0  <  T  <  \  <  t  . 


This  is  illustrated  in  Figure  10  for  T  =  10, 
t  =  5,  10,  20. 


k  =  i  T=I0 


Figure  9.  Second  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 

A  third  example  is  furnished  by  taking 


2-1 


0  <  /  <  T 


TV  *«-T)  r    ,  >  7' 


Figure  10.  Third  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 

A  fourth  example  is  furnished  by  taking 

4><t)  -  c*  -  1     <  >  0  . 

Then 

l 

57, i>o. 

Hence,  in  Figure  7,  if  f?C  =  1/k, 

fc(t)  =  /*(0  =  1  -  e~kt    t>  0  . 
The  weighting  function  obtained  is,  by  (15), 
k 


Then 


w(t,\)  = 


1  -  e 


-kl 


e-*d-x)     o  <  X  <  t 


<t>a)      \  2/7 


For  any  value  of  t  this  weighting  function  is 
exponential  in  x. 


T 
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Hence,  in  Figure  7,  if  RC  -  7/2, 
/r(fl  =  |(l      ^)    /*(»  =  -,{.    0  <  /  <  T 

=  1  =  1     /  >  T . 

CONFIDENTIAL 


Because  there  has  been  no  demand  for  varia- 
ble networks  in  the  field  of  communications, 
the  technique  of  designing  practical  variable 
networks  is  in  a  very  rudimentary  stage  com- 
pared to  that  of  designing  fixed  networks.  In 
the  remainder  of  this  chapter  we  shall  describe 


VARIABLE  AND  NONLINEAR  CIRCUITS 


U2 

some  of  the  circuits  which  have  been  developed 
for  specific  practical  applications. 

A  memory  point  method  of  obtaining 
smoothed  rates,  based  upon  (12),  is  illustrated 
below.  If  S(t),  the  quantity  to  be  smoothed, 
lepresents  the  time  derivative  E(t)  of  the  posi- 
tion data  E(t),  then  the  average  rate  is  given 
by 

Coder  the  assumption  that  the  position  data, 
aside  from  tracking  errors,  is  a  linear  function 
of  time,  the  average  rate  is  also  the  smoothed 
rate.  If  the  position  data  is  represented  by  the 
angular  displacement  of  a  shaft  in  the  com- 
puter, the  quantity  £"(0)  is  readily  fixed  by 
providing  a  second  shaft  which  is  coupled  to 
the  first  shaft  until  t  -  0  when  the  coupling  is 
broken.  Potentiometers  mounted  on  the  shafts 
are  energized  by  a  voltage  varying  as  a  func- 
tion of  time  in  the  manner  indicated  in  Figure 
11.  The  manner  in  which  the  smoothed  rate  is 
obtained  is  clear 


Fibi'iit  11.  Memory  point  method  of  obtaining 
smoothed  rate. 


The  memory   point  method  of  obtaining 

iuothed  rates  is  used  in  the  T15  antiaircraft 
director.4  In  this  application,  however,  it  is 
somewhat  more  complicated  than  in  the  simple 
illustration  described  above.  This  is  due  to  the 
fact  that  the  position  data  and  the  memory 
point  are  in  the  polar  coordinate  system, 
whereas  the  rate  components  are  referred  to 
a  tilted  and  rotating  rectangular  coordinate 
system  which  is  determined  by  the  instanta- 
neous llllr  of  sight 


Figure  12,  shows  a  way  of  securing  variable 
smoothing  in  a  purely  electrical  circuit  *  Except 
for  the  fact  that  the  division  of  the  current 
through  the  condensers  is  varied  discontinu- 


FiGURE  12.    Specific  limited  range  time-variable 
feedback  smoothing  circuit. 

ously  instead  of  continuously,  this  circuit  cor- 
responds to  the  first  or  the  second  example  dis- 
cussed in  Section  14.7. 

Figure  13  shows  the  variable  smoothing  cir- 
cuit 1  for  smoothing  first  derivatives  in  the 
M9A1-E1  antiaircraft  director.8  This  circuit 


R 


Figure  IS.    Another  specific  limited  range  time- 
variable  feedback  smoothing  circuit. 

corresponds  approximately  to  the  second  exam- 
ple of  the  differential  equation  (14)  given 
above.  The  variable  element  is  a  thermistor 
which  is  heated  up  to  a  high  temperature,  prac- 
tically instantaneously,  by  the  heater,  and  then 

k  This  circuit  is  due  to  S.  Darlington. 
1  Developed  by  R.  F.  Wick. 
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allowed  to  cool  off  naturally.  By  choosing  the 
electrical  and  thermal  constants  in  the  circuit 
correctly  the  resulting  smoothing  can  be  made 
to  approximate  that  obtained  in  a  memory 
point  circuit. 

As  noted  earlier,  all  these  variable  circuits 
require  some  auxiliary  control  means  to  reset 
the  variable  circuits  to  zero  whenever  a  new 
target  is  engaged  or  the  current  target  makes 
a  sudden  change  in  course.  In  the  T15  memory 
point  system  this  function  was  performed  by  an 
operator.  The  operator  was  aided  by  a  series  of 
meters  which  compared  the  instantaneous 
memory  point  rates  with  average  rates  set  in 
some  time  previously  by  hand.  The  visual  in- 
dication of  a  change  in  course,  calling  for  the 
selection  of  a  new  memory  point,  was  a  rela- 
tively large,  smoothly  and  decisively  varying 
deflection  on  the  meters.  In  contrast,  normal 
tracking  errors  appeared  as  relatively  small 
random  fluctuations  of  the  needles.  The  circuits 
of  Figures  7  and  12,  which  were  intended  for 
bombsight  applications,  were  also  under  the 
control  of  an  operator,  who  was  supposed  to 
start  the  mechanism  at  the  beginning  of  each 
bombing  run. 

Two  control  methods  were  used  for  the  cir- 
cuit of  Figure  13.  In  one,  large  changes  in  rate, 
corresponding  to  probable  changes  in  target 


course,  were  distinguished  by  comparing  the 
instantaneous  value  of  the  target  rate,  as  ob- 
tained directly  from  a  differentiator,  with  the 
smoothed  value  obtained  at  the  output  of  the 
smoothing  circuit.  In  the  other  method,  equiva- 
lent information  was  obtained  by  again  differ- 
entiating the  instantaneous  value  of  the  target 
rate,  making  a  second  derivative  of  the  target 
coordinate.  In  either  case  this  rate  difference 
or  second  derivative  information  was  used  to 
control  a  gas  tube,  which  went  off,  supplying 
heating  current  to  the  variable  thermistor, 
whenever  the  voltage  applied  to  it  exceeded  a 
certain  threshold.   This  threshold  evidently 
marks  the  minimum  change  in  course  for  which 
the  variable  network  will  be  reset.  In  order  to 
permit  the  use  of  a  low  threshold,  without 
making  the  circuit  unduly  liable  to  false  opera- 
tion because  of  the  effect  of  tracking  errors, 
the  gas  tube  input  voltage  was  first  transmitted 
through  a  low-pass  filter  which  suppressed 
most  of  the  energy  due  to  tracking  errors.  A 
considerable  amount  of  work  was  done  on  the 
proportioning  of  this  filter  to  provide  the  best 
protection  against  false  operation  with  a  low 
threshold  and  with  minimum  delay  in  resetting 
in  case  a  change  of  course  actually  does  occur, 
but  the  problem  remains  an  interesting  subject 
for  research. 
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NETWORK  THEORY 


THIS  APPENDIX  GIVES  a  summary  of  linear 
network  theory  which  is  pertinent  to  the 
analysis  and  design  of  data-smoothing  and 
prediction  circuits.  It  is  incomplete  in  many 
respects  and  should  therefore  be  supplemented 
by  reference  to  established  textbooks  on  the 
subject.  However,  it  contains  some  results 
which  are  new. 

The  present  summary  will  be  concerned 
mainly  with  fixed  linear  networks.  Variable 
linear  networks  will  be  considered  briefly  in 
the  last  section. 


A 1  IMPULSIVE  ADMITTANCE 

A  fixed  linear  transmission  network  is  one  in 
which  the  response  V(t)  is  related  to  the  im- 
pressed signal  E(t)  by  a  linear  differential 
equation  of  the  form 

b'dW+bn-idJiy^  +     +  M' 

dmE  dm'lE 

with  constant  coefficients.  It  is  well-known  that 
the  solutions  of  such  a  differential  equation 
obey  the  "superposition  principle."  This  makes 
it  possible  to  formulate  the  response  of  the  net- 
work to  any  signal,  in  terms  of  its  response  to 
certain  standard  signals. 

A  convenient  standard  signal  for  analytical 
purposes  is  the  "unit  impulse."  It  may  be  re- 
garded as  the  limit  of  the  rectangular  pulse 
shown  in  Figure  1  as  the  duration  of  the  pulse 


»  i  1 

Figure  1.    Rectangular  puise  signal. 

is  decreased  indefinitely  while  the  amplitude  is 
increased  in  such  a  way  that  the  area  under 
the  pulse  is  always  unity.  The  limiting  function 
thus  denned  does  not  exist  in  a  strict  mathe- 
matical sense.  However,  it  is  very  convenient 
for  analytical  purposes,  and  seldom  leads  to 
difficulties,  to  proceed  as  though  the  limiting 
function  did  exist.  An  impulse  occurring  at 


t  =  a  is  conventionally  denoted  by  the  singular 
function  Su(t  —  A)  where 

«o(t)  =  0   if  r  ^  0 
J  ha(r)dr  =0    if  t  <  0 
si     if  t>  0 

The  response  of  a  fixed  network  to  an  im- 
pulse or  any  form  of  signal  is  independent  of 
the  time  at  which  the  signal  is  applied,  provided 
it  is  expressed  as  a  function  of  the  time  relative 
to  the  application  of  the  signal.  Let  W(t)  be 
the  response  to  the  signal  &0(t).  This  is  called 
the  "impulsive  admittance"  of  the  network. 
Physically,  it  must  be  identically  zero  for  nega- 
tive values  of  t.  For  an  impulse  applied  at  t  =  A 
the  response  will  therefore  be  W(t  —  A),  which 
is  identically  zero  for  t  <  A. 

A  physical  signal  E(t)  such  as  the  one  shown 
in  Figure  2  may  be  resolved  into  an  infinite 


Figure  2.   Derivation  of  superposition  theorem. 

succession  of  elementary  impulses.  The  strength 
of  the  typical  elementary  impulsive  component, 
such  as  the  one  shown  in  Figure  2  as  occurring 
at  time  A,  is  E(\)d\.  Its  contribution  to  the 
response  at  time  t  is  E(\)-W(t  —  A) dk.  Hence 
the  contribution  of  all  the  elementary  impulsive 
components  of  the  signal,  to  the  response  at 
time  t,  is  given  by  the  formula" 

V{t)  =  f  +  E{\)  ■  W(t  -  A)d\  (2) 

This  is  one  form  of  the  "superposition  theo- 
rem" for  fixed  linear  networks. 

Before  discussing  the  reasons  for  the  limits 
of  integration  indicated  in  (2),  it  will  be  help- 
ful to  consider  a  graphical  interpretation  other 
than  the  one  used  in  deriving  the  integral.  Let 
W(t)  be  of  the  form  shown  in  Figure  3,  and  let 
^(A)  be  of  the  form  shown  in  Figure  4.  To 
determine  the  response  V(t)  at  a  given  value 
of  t,  the  curve  in  Figure  3  is  turned  over  from 
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right  to  left  and  placed  over  the  curve  in  Fig- 
ure 4  so  that  its  right-hand  edge  is  at  A  -  t.  The 
product  of  the  two  curves  gives  a  third  curve 
(not  shown),  which  is  identically  zero  for  all 
.  >  t.  The  area  under  the  third  curve  is  the  re- 


I — L-W(t) 


FlGl'RE  3.  An 


impulsive  admittance 


sponse  V(t)  at  the  given  value  of  t.  For  pro- 
gressively larger  values  of  t,  the  curve  repre- 
senting W(t  —  a)  in  Figure  4  is  simply  slid  to 
the  right  with  respect  to  the  curve  represent- 
ing E  (a)  . 


LOO 


-i     C     I     1  ?  3 

f'ieu*  4.  Graphical  iiiterpif iaUon 
turn  theoiem 


ismee  a  physical  signal  must  certainly  be 
identically  zero  up  to  some  definite  time,  or 
since  it  must  certainly  have  been  applied  to  the 
network  at  some  definite  time,  that  time  could 
be  taken  arbitrarily  as  Zero  and  (2)  could  be 
written  in  the  form 


V®  =  f 

Jo 


Elk) 


In  this  form,  however,  since 


A!rfA 


(3) 


jo 


is  in  general  a  function  of  t,  the  response  cou.d 
not  Oe  interpreted  as  a  weighted  average  of  the 
signal.  On  the  other  hand,  since 

j ^  H',/  -  Ax/A  =  jT  W\r)d7 

is  independent  of  t,  the  response  may  be  inter- 
preted as  a  weighted  average  of  the  signal,  if 


•/,  -  1 


1  h: 
as 


-ce.->sity  of  taking  tiie  lower  limit  in  f2i 
j    in  order  t"  permit  the  interpretation 
of  the  response  as  a  weighted  average  of  the 


signal,  is  also  expressed  by  the  pi»iu1  of  view 
that  a  hxed  network  cannot  make  any  ,/n/sical 
distinction  between  having  no  applud  signal 
and  having  an  applied  signal  which  happens  to 
be  of  zero  amplitude. 

Another  shortcoming  of  the  form  i'Ai  or,  for 
that  matter,  of  the  form  (2)  if  we  set  t  as  the 
upper  limit  of  integration,  comes  from  the  con- 
sideration of  impulsive  admittances  of  such  a 
nature  that  Wit  -  A)  has  certain  kinds  of  sin- 
gularities at  a  —  t.  For  example,  the  case  for 
direct  transmission,  expressed  in  the  form 
... 

VU) 


/;  > 


(A*  •  S0(t  -  A),7A 


is  ambiguous  because  the  singularity  in  the 
integrand  occurs  exactly  at  one  end  of  the 
range  of  integration.  However,  the  form 


./;' 


A I  •  bn't   —  Av/A 


leads,  without  ambiguity,  to  the  result 
V  (t)  --  E(f) .  This  example  is  not  trivia!.  Every 
network  which  transmits  infinite  frequency 
must  have  an  impulsive  admittance  of  such  a 
nature  that  WU  \)  contains  a  singularity  of 
the  I'm n,  &,.('  a).  Any  attempt  to  rule  out  such 
a  singularity  on  the  ground  that  physical  net- 
works cannot  in  fact  transmit  infinite  fre- 
quency, complicates  the  analysis  and  design  of 
networks  unduly.  If  a  network  is  capable  of, 
or  is  expected  to  transmit  frequencies  at  the 
top  of  the  range  of  interest  or  importance,  it  is 
simpler  to  assume  that  the  network  is  capable 
of,  or  is  expected  to  transmit  all  frequencies 
above  that  range. 


One  other  advantage  of  taking  the  limit 


s  of 


integration  as  indicated  in  (2)  may  be  called 
to  attention  Keeping  in  mind  that  /-.'(a)  is 
identically  zero  for  all  values  of  A  below  some 
definite  though  perhaps  unknown  value,  and 
that  Wit  ai  is  identically  ,tro  for  all  values 
of  a  t,  it  is  viear  that  (2)  may  be  integrated 
partially  any  number  of  times  without  incur- 
ring the  burden  of  carrying  a  string  of  iff  ins 
outside  of  the  integral.  Af?«r  one  pamai  inte- 
gration we  have 


where 


I'/) 


.1  ;/ 


Sine  £  i  a,  ..<  identic.  !:>  .  ],„  ai.  ,.,:„,..  0f 
.-.  in  vM-.n-h  Eix)  >  :ienti«all>  zer.    ...itd  *inee 
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A(t  -  A)  is  identically  zero  for  all  values  of 
A  >  t,  a  second  partial  integration  may  be  per- 
formed with  no  more  formal  complication  than 
the  first  partial  integration.  The  fact  of  the 
matter  is  that  the  terms  which  ordinarily  arise 
in  partial  integrations,  outside  of  the  integral, 
are  here  carried  under  the  integral  by  singulari- 
ties of  the  integrand. 

The  superposition  theorem  in  the  i^rm  (4) 
may  be  derived  directly  in  a  manner  similar  to 
the  derivation  of  (2).  A(t  -  i)  is  the  response 
of  the  network  to  a  Heav;  ..e  unit  step  func- 
tion H(t  —  a)  applied  at  t     A,  where 

H(1  -  X)  m  0     when  t  <  X 

=  1      when  t  >  A  . 

The  signal  is  resolved  into  an  infinite  succes- 
sion of  elementary  step  functions  of  amplitude 
E'{k)dk  wherever  E(k)  is  continuous,  and 
finite  step  functions  of  amplitude  dE(k)  wher- 
ever £"(a)  has  a  finite  discontinuity.  The  con- 
tribution of  each  elementary  step  function  to  the 
response  at  time  t  is  E'  (k)  A(t  —  k)dk,  that 
of  each  finite  step  function  is  A  (t  -  A)  •  dE(k). 
Hence,  the  response  is  given  formally  by  (4) 
with  the  understanding  that  E'(k)dk  is  to  be 
interpreted  as  dE(k)  wherever  E(k)  is  discon- 
tinuous.* 

The  response  A  (t)  of  the  network  to  a 
Heaviside  unit  step  function  H(t)  applied  at 
t  —  0  is  called  the  "indicial  admittance"  of  the 
network.  It  is  more  familiar,  in  the  field  of 
linear  transmission  theory,  than  the  impulsive 
admittance  to  which  it  is  related  by  (5),  but  in 
this  monograph  preference  is  given  to  the  use 
of  the  impulsive  admittance.  In  the  theory  of 
linear  differential  equations  the  impulsive  ad- 
mittance is  known  as  a  Green's  function. 

It  is  often  convenient  to  express  the  response 
so  that  the  variable  of  integration  represents 
the  age  of  the  elementary  components  of  the 
signal.  Introducing  the  age  variable 

r  =  t-  A  (0) 

into  (2),  we  have 

F(0  =  £*FAt-T)  ■  W(r)dr.  (7) 

•Formula  (4)  may  be  written  in  the  Stieltjes  form 
V(t)=  I  A(t-\)aE(\). 

Alternatively,  we  may  take  the  point  of  view  that 
E'(A)  contains  impulsive  singularities  wherever  E(\) 
is  discontinuous.  This  point  of  view  is  generalized  in 
Appendix  B. 


In  this  form  it  is  clear  that  the  weighting  of 
signal  components  is  on  the  basis  of  age  only. 
A  fixed  network  may  be  said  to  have  a  memory 
which  is  a  function  only  of  the  age  of  past 
events. 

In  the  preliminary  stages  of  designing  a 
smoothing  network,  the  weighting  function 
W(T)  is  generally  prescribed  to  be  identically 
zero  when  t  >  T  say,  as  well  as  when  t  <  0. 
This  does  not  violate  the  conditions  of  physical 
readability.  However,  such  a  weighting  func- 
tion cannot  be  obtained  exactly  with  a  network 
of  a  finite  number  of  discrete  impedance  ele- 
ments. A  finite  network  invariably  yields  a 
weighting  function  with  a  "tail"  which  extends 
to  infinity. 

*•«  TRANSMISSION  FUNCTION 

Theoretically,  the  impulsive  admittance  of  a 
prescribed  network  may  be  determined  directly 
from  the  differential  equations  of  the  network 
in  a  perfectly  straightforward  manner.  Prac- 
tically, however,  it  is  very  difficult  to  do  so  if 
the  network  has  more  than  two  meshes.  Fur- 
thermore, the  technical  problem  of  designing 
a  network  directly  from  a  prescribed  impulsive 
admittance  is  even  more  difficult,  particularly 
if  the  impulsive  admittance  is  not  exactly  re- 
alizable. 

These  difficulties  may  be  avoided  by  recourse 
to  the  highly  developed  methods  of  network 
analysis  and  synthesis  used  in  the  field  of  com- 
munication circuits.  These  methods  are  based 
upon  the  steady-state  properties  of  networks. 

If  a  signal  consisting  of  the  single  sinusoid 
cos  <i>£  is  applied  to  an  invariable  or  fixed 
linear  transmission  network,  the  steady-state  re- 
sponse" will  also  be  a  single  sinusoid  of  the 
same  frequency.  The  amplitude  and  phase  of 
the  response,  relative  to  the  signal,  will  in 
general  depend  upon  the  frequency.  The  re- 
sponse may  be  regarded  as  the  resultant  of  an 
"inphase  component"  proportional  to  cos  o>£, 
and  a  "quadrature  component"  proportional  to 
sin  U,  with  amplitude  coefficients  which  are 
functions  of  the  frequency.  Furthermore,  since 
the  signal  is  an  even  function  of  the  frequency, 
the  response  should  also  be  an  even  function 
of  the  frequency.0  Hence,  the  response  will 

"  This  is  the  response  apart  from  transient  compo- 
nents, assuming  that  the  latter  vanish  exponentially 
with  time  after  the  signal  is  impressed. 

c  The  signal  is  also  an  even  function  of  the  time  but 
this  is  due  only  to  the  particular  choice  of  origin  which 
is  arbitrary. 
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be  of  the  form  G(w2)  cos  wt  —  wH(w2)  sin  wt, 
where  G  and  H  are  even  real  functions  of  fre- 
quency. 

By  a  suitable  shift  of  the  origin  of  time  it 
follows  that  if  the  impressed  signal  is  sin  wt, 
the  steady-state  response  will  be  of  the  form 
G(w2)  sin^f  +  o)H(oj')  cos  wt. 

These  two  results  may  be  combined  into  a 
simpler  expression  without  any  loss  of  indi- 
viduality.  Since  eiu>t  -  cos  wt  +  i  sin  wt  where 
i  =  \/  —  1,  we  have 

V(t)  =  '[<?(»*)  -(-  iuH(u')}  ■         if  E(l)  =  e". 

A  further  simplification  may  be  achieved  by  re- 
placing iw  by  p,  and  G(  -  p2)  +  pH{-  p2)  by 
Y{p),  so  that 

V(f)  =  Yip)  ■  e"     if  E{t)  =  e*  .  (8) 

Y  (p)  is  called  the  "steady-state  transmission 
function"  or  just  "transmission  function"  for 
short. 

Strictly  speaking,  (8)  expresses  the  relation 
of  steady-state  response  to  signal  only  if  p  =  u>. 
However,  it  is  customarily  called  a  steady-state 
relation  even  when  p  is  not  a  pure  imaginary 
quantity.  It  may  be  noted  that  Y(p)  is  real 
when  p  is  real. 

The  simplicity  of  steady-state  analysis  de- 
rives from  the  fact  that  time  occurs  in  the 
signal  and  throughout  the  network  only  in  the 
form  ept.  In  particular,  the  determination  of 
the  transmission  function  is  reduced  to  the 
solution  of  simultaneous  algebraic  equations 
which  do  not  involve  the  time  factor.  For  a  net- 
work in  which  the  signal  and  the  response  are 
related  by  the  linear  differential  equation  (1) 
with  constant  coefficients,  we  obtain  simply 

KV      6o  +  6,p  +  •  •  ■  +  f>„pB  ' 

It  may  be  noted  that  the  poles  of  the  transmis- 
sion function,  also  referred  to  as  "infinite-gain 
points"  in  the  p-plane,  correspond  to  the  roots 
of  the  characteristic  function  of  the  differential 
equation.  Physical  restrictions  on  the  location 
of  infinite-gain  points  will  be  considered  in  Sec- 
tion A.9. 


AJ  RELATIONSHIP  BETWEEN 

IMPULSIVE  ADMITTANCE  AND 
TRANSMISSION  FUNCTION 

A  relationship  between  the  impulsive  admit- 
tance and  the  transmission  function  of  a  net- 


work may  be  obtained  from  (7).  Putting 
E(t)  =  e"  when  t  >  0,  we  get 


V(t)  =  ePtJ^'w(T^  e'*1  dT 
=  e"jT  W(t)  e~*  dr 


W(t)  e-»  dr 


(9) 


The  second  term  in  (9)  is  a  transient  term  due 
to  the  fact  that  we  have  taken  E{t)  ==0  when 
t  <  0.  The  first  term  in  (9),  which  involves  the 
time  only  through  e"',  is  the  steady-state  term. 
Comparing  this  term  with  (8)  we  get 


Y(p) 


W(t)  e~"  dt 


(10) 


or,  in  the  notation  which  will  be  introduced  in 
the  next  section 


A.4 


Y(p)  =  L[W{t)\  . 

LAPLACE  AND  INVERSE  LAPLACE 
TRANSFORMS 


(ID 


The  frequent  use  which  is  made  of  the 
Laplace  transform  and  its  inverse,  in  the 
analysis  and  design  of  fixed  linear  networks, 
warrants  a  brief  discussion  of  these  trans- 
forms. 

Given  a  function  f(t)  which  is  identically 
zero  when  t  <  0,  its  Laplace  transform  g  (p)  is 
defined  by  the  formula 


g(p)  =  Hf(t)] 


f(t)  e-"  dt 


(12) 


This  is  usually  written  with  0  for  the  lower 
limit,  but  by  having  the  point  t  =  0  inside  the 
range  of  integration,  instead  of  at  the  end,  we 
secure  the  same  advantages  for  (12)  that  we 
gained  in  the  case  of  (2)  by  having  the  point 
k  =  t  inside  the  range  of  integration.  Since  f(t) 
is  identically  zero  when  K0  we  could  write 
—  oo  for  the  lower  limit  in  (12) ,  but  this  would 
run  the  risk  of  confusion  with  the  so-called 
"bilateral  Laplace  transform."  On  the  whole, 
it  is  worth  while  to  have  a  constant  reminder 
that  functions  f(t)  which  are  not  identically 
zero  when  t  <  0  are  ruled  out. 

The  integral  in  (12)  is  usually  not  con- 
vergent for  all  values  of  p.  That  is,  in  order  to 
secure  convergence  of  the  integral,  it  may  be 
necessary  to  assume  R(p)  >a,  where  R(p)  is 
the  real  part  of  p,  and  a  is  a  real  number.  The 
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result  of  the  integration  is  a  representation  of 
g(p)  in  the  half-plane  R(p)  >  a.  Since  the 
representation  is  analytic  throughout  the  half- 
plane,  the  principle  of  analytic  continuation 
allows  us  to  extend  the  definition  of  g(p)  to 
the  remainder  of  the  /;-plane. 

Given  a  function  g{p)  which  is  analytic 
throughout  the  half-plane  R(p)  >  c  where  c  is 
a  real  number,  its  inverse  Laplace  transform 
/(f)  is  given  by  the  formula 


f{t)  =  L-'[ff(p)] 


]  fc+ia 


<j{p)  €*<  dp  (13) 


provided  /(f)  is  identically  zero  when  t  <  0. 
If  the  result  of  the  integration  in  (13)  is  not 
identically  zero  when  t  <  0,  g(p)  is  not  a 
Laplace  transform  and  the  application  of  the 
inverse  transformation  to  it  is  meaningless. 

Translation  Theorem 

A  useful  theorem  can  be  established  at  this 
point.  This  is  the  translation  theorem. 
If 

G{p)  =  L[F(t)~\ 

then 

L->[G(p)e  ^  =  F(t  -  a) 

provided  that  F  (f  —  a)  =s  0  when  t  <  0.  Trans- 
lation is  to  the  right  or  left  according  as  a  is 

—  ™ 

positive  or  negative. 

If  it  happens  that  F(f)==0  when  t  <  t0 
where  f0  >  0,  then  the  restriction  is  that 
a>  —  t0.  That  is,  a  limited  amount  of  transla- 
tion to  the  left  is  permissible.  In  general,  f0  =  0 
and  the  restriction  is  therefore  that  a  >  0.  This 
theorem  follows  readily  from  (12)  or  (13). 

In  all  of  the  applications  of  (13)  which  we 
have  any  occasion  to  make  in  the  analysis  and 
design  of  fixed  linear  networks,  the  function 
g(p)  may  be  resolved  into  a  sum  of  terms  of 
the  form  G(p)e-pa  where  a  >  0  and  G(p)  is  a 
rational  algebraic  function  with  real  coeffi- 
cients. Making  use  of  the  translation  theorem, 
the  problem  of  evaluating  L1  [g  (p)  ]  reduces  to 
that  of  evaluating  L-'[G(p)].  Now,  G(p)  may 
be  resolved  into  a  sum  of  terms  of  the  form 
p"  or  l/(p  —  a)m+1  where  m  =  0,  1,  2  -  ••.  We 
shall  consider  these  two  cases  separately. 

The  case  G  (p)  =  p"  will  be  treated  by  means 
of  (12)  and  some  limiting  processes.  In  Sec- 
tion A.l  the  unit  impulse  was  regarded  as  the 
limit  of  a  rectangular  pulse  of  duration  T  and 
amplitude  1/7.  By  means  of  (12)  the  Laplace 


transform  of  such  a 
0  <  f  <  T  is 


over  the  interval 


1  -  tr* 
pT 


Hence 


L  [£,(()]  =  lim  1  -  e->T  _ 

T-*0        pf         -  1  • 

Formally  therefore 

L->  [1]  =  1,(0  (14) 

Similarly,  the  Laplace  transform  of  a  pulse 
over  the  interval  a  <  t  <  a  +  T  where  a  >  0  is 

1  -c-"r 
pT 

Hence 


L[60(t-a)} 


lim   1  -  e-"r 


Formally  therefore 

L-i  [e-~]  =  &0(t~a)  . 

The  last  result  follows  directly  from  (14)  using 
the  translation  theorem. 
Next,  let 

r-*o  ji 

This  is  the  limiting  case,  as  shown  in  Figure  5, 
of  two  impulses  of  strengths  1/T  and  -1/T 
separated  by  a  time  interval  T.  It  may  be  called 


T 


-t 


V-ipCt-T/T 

Figure  5.   An  impulse  doublet. 

an  impulse  of  second  order.  By  (12)  and  the 
previous  results 

L  [1,(0]  -  Km  1  -«-"',  - 
r-»o      f         v  • 

Formally  therefore 

L~l  [p]  -  «,«) . 


(15) 


Proceeding  in  this  fashion  we  may  define  an 
impulse  of  (m  +  l)th  order  as 


Ut)  =  lim    <— .«)  -  «— i  (t-T) 


T-*0 


(16^ 
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and  we  may  then  show  that 

MM')]  =  r. 

Formally  therefore 

L~l  [jr]  «  a.(0 


then 


(17) 


This  disposes  of  the  case  G(p)  =  pm  where 
m  —  0, 1, 2  •  •  • . 

The  case  G(p)  =  1/  (p  -  a)  "*l  will  be  treated 
by  means  of  (13)  and  Jordan's  lemma. 

Jordan's  Lemma 

If  all  the  singularities  of  G(p)  can  be  en- 
closed by  a  circle  of  finite  radius  with  center  at 
the  origin,  and  if  G  (p)  -*0  uniformly  with 
respect  to  arg  z  as  \z\  ->  oo,  then 

G(p)e*dp]  -  0 

where  r  is  a  semicircle  oi  radius  P,  with  center 
at  the  origin,  to  the  right  of  the  imaginary  axis 
if  t  is  negative,  to  the  left  of  the  imaginary  axis 
if  t  is  positive. 

By  the  use  of  this  lemma  the  contour  of  inte- 
gration in  (13)  may  be  closed  and  the  integra- 
tion may  then  be  performed  by  the  method  of 
residues.  In  the  case 


lira 


<?(P) 


(p  -  a)-+l 

we  readily  obtain 


where  m  —  0,  1,  2 


[(p  -  a)-+>] 


t  <  0 


ml 


(18) 


/  >  0. 


An  important  special  case  of  (18),  correspond- 
ing to  o  =  0,  is 


J    Lp"+1J  m! 


<  >  0 


(19) 


Another  useful  theorem  which  is  readily 
established  by  means  of  (12)  and  (13)  is 
Borel's  theorem. 


Borel's  Theorem 

If  0(P),  9Av),  9ii.P)  are  the  Laplace  trans- 
forms of  f(t)t  /,(«),  /,(*),  respectively,  and  if 

g(p)  -  0i(p)  0t(p) 


m  -       " x)  /,(x)dx 

-  £jx{T)-S*{t-r)dr. 

The  functions  /,  (O  and  ft(t)  are  subject  to 
conditions  which  permit  the  inversion  of  the 
order  of  integration  in  the  following  proof. 
However,  these  conditions  are  seldom  of  any 
concern.  We  have 

ftfl  =  L-l{0i(p)  •  L  [/»(*)]} 

Inverting  the  order  of  integration  and  noting 
that 


2x1  Jc-i<r> 


gi(p)tp(,~x)  dp 


0     if  X  >  t 
f(t  -  X)      if  X  <  < 
we  obtain  the  result  stated  in  the  theorem. 


*•»  ALTERNATIVE  EXPRESSION  OF  THE 
RESPONSE-TO-SIGNAL  RELATIONSHIP 

The  result  (8)  obtained  in  Section  A.2  sug- 
gests an  operational  expression  of  the  form 

V®  =  Y(p)  ■  E®  (20) 

for  the  response-to-signal  relationship  what- 
ever the  signal  E{t)  might  be.  If  the  equiva- 
lence of  this  operational  expression  to  (2)  it 
taken  as  a  matter  of  definition  we  may  readily 
discover  the  nature  of  the  implied  operation. 

In  the  light  of  Borel's  theorem,  (2)  may  be 
expressed  in  the  form 

L[V(t)}  =  L\W(»]  •  L\EW] 

under  the  permissible  assumption  that  £(t)«0 
when  t  <  0.  Hence 

V(#)  =  lrx  [LflPOl  ■  L{E(t))\ 

or,  by  (11) 

V(0  =  L~l  \  Y(p)  ■  L[E(t)]\  .  (21) 

This  is,  therefore,  in  general  the  meaning  of 
the  operational  expression  (20)  .4 


o  We  note  that  if  S(p)  =  L\E(t)\,  the  operational 


V(t)  ~  S(p)  ■  W{t) 
U  equivalent  to  (20).  Thii  form  ia  need  in  Section  104 
and  in  Appendix  B. 
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The  symmetry  of  the  impulsive  admittance 
is  expressed  by 

W(T  -  t)  =  W(t) 

Since  W(t)  =0  when  t  <  0,  it  must  be  so  also 
when  t  >  T.  Hence 

'     W{t)e~*dt  +  /  W(t)e~*dt. 

By  a  change  of  variable  of  integration  the  sec- 
ond term  may  be  expressed  in  the  form 

W(T  -t)e-*T-»dt 


Assume  that  W(t)  admits  the  series  expan- 
sion 

Wit)  =  a0  +  A,t  +  ...  +4;r  +  •••  •  <25) 

771 , 


r 


or,  because  of  the  sj 

Xr/i 
W(Qe*  dt . 

Hence,  if  the  first  term  in  Y(p)  be 

W(t)e-*  dt 

we  have 

Y(p)  =  Yy(p)  +  Yi{-p)er+* 

=  [iri(p)epT/2  +  Ki(-p)e-pT/2]  tr*Tn  . 

At  real  frequencies  (p  =  u>)  the  bracketed  fac- 
tor is  evidently  an  even  real  function  of 
Hence 


Y(tu) 


•  e-u*r/I. 


(24) 


Apart  from  discontinuities  in  the  phase  angle 
of  the  transmission  function  at  real  frequencies 
»  for  which  QU2)  is  zero,  the  phase  angle  is 
proportional  to  frequency.  Such  a  transmission 
function  is  referred  to  as  a  linear  phase  trans- 
mission function.  Sinusoidal  components  of  the 
signal,  of  frequencies  less  than  the  lowest  fre- 
quency at  which  Q  (<uJ)  vanishes,  suffer  phase 
retardations  in  transmission  in  proportion  to 
their  frequencies.  These  components  therefore 
contribute  no  delay  distortion.  They  are  delayed 
by  a  uniform  amount,  just  as  they  are  in  a 
properly  terminated  distortionless,  uniform 
transmission  line,  although  in  the  case  of  (24) 
they  contribute  amplitude  or  loss  distortion 
through  Qiw2).  The  delay  in  (24)  is  just  half 
of  the  "smoothing  time"  T. 

SERIES  RELATIONSHIPS  BETWEEN 
IMPULSIVE  ADMITTANCE  AND 
TRANSMISSION  FUNCTION 

Two  useful  series  relationships  between  im- 
pulsive admittances  and  transmission  functions 
will  be  derived  in  this  section. 


for  small  positive  values  of  t.  Then  by  (11) 
and  (19) 


(26) 


pi     1  '  pmH 

If  A0  0  the  transmission  cannot  drop  off 
faster  than  6  db  per  octave  as  the  frequency 
increases  indefinitely.  If  the  transmission  is  to 
drop  off  ultimately  at  the  rate  of  6fc  db  per 
octave  all  of  the  A's  up  to  and  including  Ak.2 
must  be  zero.  This  is  to  say  that  the  impulsive 
admittance  and  all  of  its  derivatives  of  orders 
up  to  and  including  the  (k  —  2)th  must  vanish 
at  *  =  0. 

Next,  let  us  suppose  that  the  impulsive  ad- 
mittance and  all  of  its  derivatives  of  orders  up 
to  and  including  the  (k  —  2)th  are  continuous 
through  all  values  of  t  including  t  —  0  except 
that  the  (k  —  2)th  derivative  is  discontinuous 
only  at  t  =  a.  We  may  resolve  the  impulsive 
admittance  into  the  sum  W,(t)  +  W2(t)  where 
W1  (t)  and  all  of  its  derivatives  of  orders  up  to 
and  including  the  .  (fc  —  2)th  are  continuous 
through  all  values  of  t  including  t  =  0,  while 
W2(t)  =0  for  all  values  of  t  <  a.  Then,  for 
small  positive  values  oft  —  a 

Ak.i  (t  -  a)*"' 


W,(t) 


(k  - 


(Ak.t  *  0) 


whence 


Hence  the  transmission  cannot  drop  off  ulti- 
mately faster  than  6(k  —  1)  db  per  octave.  We 
may  summarize  these  results  in  the  asymptotic 
loss  theorem. 


Asymptotic  Loss  Theorem. 

If  the  transmission  is  to  drop  off  ultimately 
at  the  rate  of  6A;  db  per  octave  as  the  frequency 
increases  indefinitely,  the  impulsive  admittance 
and  all  of  its  derivatives  of  orders  up  to  and 
including  the  (k  —  2)th  must  be  continuous 
through  all  values  of  t  including  t  =  0. 

Discontinuities  in  W(t)  or  in  some  deriva- 
tive of  W(t)  cannot  occur  except  at  t  =  0  in 
the  case  of  physical  lumped  element  networks. 
Practically,  however,  rapid  changes  in  W(t) 
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or  in  some  derivative  of  W(t),  at  any  value  of 
t,  may  be  expected  to  be  associated  with  much 
the  same  behavior  of  the  transmission  at  rea- 
sonably high  frequencies.  As  an  example  con- 
sider the  case 

W{t)  =  e--  -e-v      (0  >  a  >  0). 
0  -  a 


F(p) 


(p  +  + 


W(t)  is  continuous  through  t  —  0  as  long  as  0 
is  finite  but  becomes  discontinuous  there  in  the 
limit  as  fi-*  ».  The  first  derivative  of  W(t) 
is  discontinuous  through  t  =  0  even  when  0  is 
finite.  The  ultimate  slope  of  the  transmission  is 
12  db  per  octave,  in  accordance  with  the 
asymptotic  loss  theorem,  but  in  the  range 
a  <  w  <  p  the  transmission  appears  to  have  a 
slope  of  only  6  db  per  octave. 

The  importance  of  the  observations  made  in 
the  preceding  paragraph,  in  the  design  of  a 
network,  is  that  if  we  attempt  to  approximate 
a  W(t)  which  has  a  discontinuity  in  a  deriva- 
tive of  lower  order  at  t  =  a  than  at  t  =  0,  the 
fact  that  the  physical  approximation  must  have 
continuous  derivatives  of  all  orders  and  through 
all  values  of  t  except  t  -  0  is  not  very  signifi- 
cant. The  ultimate  slope  of  the  transmission 
may  not  be  reached  until  the  frequency  is  too 
high  to  be  of  any  importance. 

Another  useful  relationship  between  impul- 
sive admittance  and  transmission  function  fol- 


PHYSICAL  RESTRICTIONS  ON  THE 
TRANSMISSION  FUNCTION 

The  transmission  function  Y(p)  of  a  lumped 
element  network  is  a  rational  algebraic  func- 
tion of  p.  It  is  real  for  real  values  of  p  (A.2) . 
Hence,  the  coefficients  must  be  real,  and  there- 
fore the  roots  and  poles  must  either  be  real  or 
occur  in  conjugate  complex  pairs. 

Such  a  function  may  be  expanded  into  the 
sum  of  a  polynomial  and  a  rational  function 
whose  numerator  is  of  lower  degree  than  the 
denominator.  The  latter  may  therefore  be  prop- 
erly expanded  into  partial  fractions.  For  a 
partial  fraction  of  the  form 

— L_      *here)B=l,2  ... 
(p  —  a)" 

the  contribution  to  the  impulsive  admittance 
W(t)  is  by  (18) 

I;  1~- 1  =  ,  »        «  >  0)  . 

L(p  -  a)"J       (m  -  1)! 

For  a  pair  of  partial  fractions  of  the  form 


A  +  iR  A  -  iB 

(p  -  a  +  iff)"  +  (p  -  a  -  iff)m 


the  contril 


2r-i 


to  the  impulsive  admittance  is 

C  (A  cos  fit  +  B  sin  pi)  . 


(m  -  1)! 

Since  the  impulsive  admittance  is  the  re- 
sponse to  an  impulsive  signal  it  is  clear  that  for 
/"»  a  stable  network  the  impulsive  admittance  must 

lows  from  the  assumption  that   /    t-W  (t)  dt    be  free  of  terms  which  increase  indefinitely 

with  time,  either  on  account  of  an  amplitude 


is  finite  for  m  = 
exponential  in 


1,  2    ...  If  we  expand  the 


F(p)  =  /  \\'itu-*,tt 
into  a  power  series  in  pt  we  get 


F(P)  -  M,  -  M ,  p  +  _ 


2! 


3! 


+ 


where 


rW(t)di . 


(27) 


(28) 


The  quantity  Mm  is  the  mth  moment  of  the  im- 
pulsive admittance. 

When  M„  =  1  we  speak  of  the  response  of  the 
network  as  a  weighted  average  of  the  impressed 
signal,  and  speak  of  the  impulsive  admittance 
W(t)  as  the  weighting  function. 


factor  of  the  form  eat  where  a  >  0,  or;  in  the 
event  that  a  =  0,  on  account  of  an  amplitude  fac- 
tor of  the  form  fr"-1  where  m  >  1.  Hence,  the 
physical  restrictions  on  the  transmission  func- 
tion are: 

1.  No  poles  with  positive  real  parts. 

2.  Poles  on  the  imaginary  p  axis  must  be 
simple." 

The  poles  of  a  passive  transmission  function 
correspond  to  modes  of  free  motion.lsh  Each  of 
them  may  be  shownlM  to  satisfy  an  equation  of 
the  form 

pT  +  F  +  -  =  o 
P 

where  T,  F,  V  are  positive  quantities  whose 
values  depend  upon  the  particular  mode  and 

•  Poles  on  the  imaginary  p  axis  must  also  be  ruled 
out  on  the  ground  that  persistent  transients  cannot  be 
tolerated  any  more  than  growir 
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its  activity.  However,  T  is  zero  in  the  absence 
of  kinetic  energy,  F  is  zero  in  the  absence  of 
energy  dissipation,  and  V  is  zero  in  the  absence 
of  potential  energy.  It  follows  that  in  the 
absence  of  coils  or  in  the  absence  of  condensers, 
the  transmission  function  must  have  poles  only 
on  the  negative  real  p  axis. 

For  extremely  narrow-band,  low-pass  appli- 
cations, such  as  data  smoothing,  it  is  not  prac- 
ticable to  build  networks  which  call  for  coils 
because  these  generally  turn  out  to  be  of  many 
thousands  of  henries  in  inductance.  The  exclu- 
sion of  coils  from  these  applications  does  not, 
however,  rule  out  transmission  functions  with 
complex  poles.  These  may  be  realized  with  RC 
networks  in  feedback  amplifier  circuits  as  is 
shown  in  Chapter  12. 

*•»  QUASI-DISTORTIONLESS 
TRANSMISSION  NETWORKS 

A  quasi-distortionless  transmission  network 
is  one  which  is  distortionless  only  in  a  certain 
sense.  This  sense  will  be  made  clear  in  this 
section. 

Let 


Y(p) 


1  +  dip  +  o2p2  +  ■  ■  •  +ampm 

1  +  hp  +  62p2  +  .  .  .  +  bnjj* 


(29) 


This  may  also  be  written  in  the  form 


Y{p)  -  1  +  clP  + 


C-^+...+CI^+pr  +  lg(p)m 

Obviously  g  (p)  will  be  a  rational  function  with 
the  same  denominator  as  Y(p)  and  a  numera- 
tor of  (*n-l)th  degree.  If  we  now  apply  a  sig- 
nal of  the  form 


E{t)  =  0 

=  r 


for  t  <  0 
for  i  >  0 


the  response,  by  (21),  will  be 

V(t)  «  F  +  rcT*  +  ^7=2),  cS-'+.-.+c, 

+  rl  L-1  [g(p)}  «>0). 

If  the  coefficients  in  the  rational  expression  for 
Y(p)  are  such  that 


ci  =  t/,  c2  =  //,•■•  cr  =  fj 


(31) 


then 


V(t)  =  (t  +  t,)>  +  r!  L-i  [g(p)}      (t  >  0).  (32) 

The  second  term  vanishes  exponentially  with 
time.  The  first  term  is  an  advanced  or  a  re- 
tarded facsimile  of  the  applied  signal  accord- 


ing to  whether  t,  is  positive  or  negative.  We 
shall  say  that  Y(p)  is  the  transmission  func- 
tion of  a  network  which  is  quasi-distortionless 
to  the  signal  tr. 

Obviously  a  transmission  network  which  is 
quasi-distortionless  to  the  signal  f  must  also  be 
quasi-distortionless  to  every  signal  f  where  s 
is  a  positive  integer  less  than  r,  including  zero. 
Hence  we  may  state  the  quasi-distortionless 
transmission  theorem. 

Quasi-Distortionless  Transmission 
Theorem 

If  the  signal 

E{t)  =  0  for  t  <  0 

=  polynomial  of  degree  r  at  most  in  /  for 
t  >  0 

is  applied  to  a  "quasi-distortionless  transmis- 
sion network  of  order  r,"  the  response  will  be 
of  the  form 

I'm  =  E{t  +  if)  +  {)(<■-<)      for  /  >  o, 

where  O(e  ')  stands  for  terms  which  vanish 
exponentially  with  time. 

If  t,  >  0  the  transmission  network  is  a  pre- 
dictor for  polynomials  of  degree  r  at  most. 
However,  it  does  not  begin  to  predict  properly 
until  some  time  has  elapsed  after  the  start  of 
the  signal,  or  of  a  new  analytic  segment  of  the 
signal;  that  is,  until  the  transients  have  sub- 
sided sufficiently. 

If  t{  —  0  the  transmission  network  may  be 
regarded  as  a  delay-corrected  smoother  for 
polynomials  of  degree  r  at  most.  This  is  ob- 
tained simply  by  taking 


ai  =  bi,  n2  =  b2,  ■■■  aT  =  bT 


(33) 


in  (29), 


A. 11 


VARIABLE  LINEAR  NETWORKS 


A  variable  linear  transmission  network  is 
one  in  which  the  response  V(t)  is  related  to  the 
impressed  signal  £(0  by  the  linear  differential 
equation  (1)  with  coefficients  which  are  pre- 
scribed functions  of  t.  The  solutions  of  such  a 
differential  equation  also  obey  the  superposi- 
tion principle.  Thus  it  is  possible  in  this  case 
also  to  formulate  the  response  of  the  network 
to  any  signal  in  terms  of  its  response  to  a 
standard  impulsive  signal. 

The  response  of  a  variable  network  to  an 
impulse  or  any  form  of  signal  depends,  how- 
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ever,  on  the  time  at  which  the  signal  is  applied. 
For  an  impulsive  signal  applied  at  time  \  the 
response  at  time  t  will  be  represented  by 
W(t,x).  This  is  still  called  the  "impulsive  ad- 
mittance." In  the  theory  of  linear  differential 
equations  it  is  known  as  a  Green's  function. 
Physically,  it  must  be  identically  zero  for 

The  superposition  theorem  may  now  be  writ- 
ten in  the  form 

V(t)  =  jT+  E(\)  ■  W(t,\)  d\  (34) 

provided  the  network  has  been  properly  de- 
signed and  set  into  operation  at  t  —  0.  If 


W(t,\)  dX  =  1 


for  all  values  of  t  >  0,  the  response  may  be 
interpreted  as  a  weighted  average  of  the  sig- 
nal. We  note  that  in  order  to  interpret  the 
response  as  a  weighted  average  of  the  signal, 
it  is  now  no  longer  necessary  to  take  the  lower 
limit  in  (34)  as  —  oo,  as  it  was  in  the  case  of 
(2)  for  a  fixed  network.  In  other  words,  a 
variable  network  can  be  designed  and  set  into 
operation  at  any  time  so  that  components  of 
the  signal  which  arrive  before  that  time  are 
completely  ignored. 

The  analysis  and  design  of  variable  linear 
networks  are  in  general  much  more  difficult 


than  those  of  fixed  linear  networks.  This  is  due 
largely  to  the  fact  that  there  does  not  yet  exist 
a  technique  corresponding  to  the  steady-state 
and  operational  methods  used  in  connection 
with  fixed  networks.  However,  there  is  a  class 
of  variable  networks  whose  analysis  and  design 
are  greatly  facilitated  by  the  fact  that  they  are 
related  to  fixed  networks  by  a  transformation 
of  the  time  variable. 

Consider  the  linear  differential  equation 

.   d"V  dn~lV  ,  .  dV  ,  Tr  „ 

with  constant  coefficients.  With  appropriate 
restrictions  on  the  roots  of  the  characteristic 
function 

6nXn  +  fc.-xX"-1  +  •••  +bi\  +  1 

it  represents  the  response-to-signal  relation- 
ship in  a  fixed  network,  if  z  is  proportional 
directly  to  time.  However,  if  z  is  a  more  gen- 
eral function  of  the  time,  it  will  correspond  to 
a  variable  network.  The  kind  of  transformation 
which  is  desired  here  is  one  which  transforms 
the  range  -  oo  <  z  <  +  tx  into  the  range 
0  <  t  <  +  oo  with  a  one-to-one  correspondence. 
Thus,  we  may  take  z  =  log  6(t)  where  6 (t)  is  a 
positive  monotonic  increasing  function  of  t  in 
the  range  0  <  t  <  +  oo,  with  <li£0  6(t)  =  0.  Sev- 
eral examples  of  6(t),  including  0(t)  =  t,  are 
considered  in  detail  in  Chapter  14. 
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THEORETICAL  MODIFICATIONS  OF  SMOOTHING  FUNCTIONS  TO  FIT 

NONUNIFORM  NOISE  SPECTRA 


BEST  smoothing  or  weighting  functions  have 
been  determined  in  Chapters  10  and  11 
under  the  assumption  of  random  noise  with  fiat 
spectrum.  It  has  not  been  worth  while  in  prac- 
tice to  base  the  choice  of  best  weighting  func- 
tions on  any  more  elaborate  considerations  of 
actual  noise  spectra,  for  at  least  three  reasons : 

1.  The  effectiveness  of  a  smoothing  network 
shape  of  the  weighting  function. 

2.  Noise  spectra  are  subject  to  variations, 
due  to  factors  which  it  is  not  desirable  in  prac- 
tice to  attempt  to  control. 

3.  Elaborate  smoothing  functions  require 
elaborate  networks  with  close  tolerances  on  ele- 
ment values. 

Nevertheless,  the  theory  of  smoothing  pre- 
sented in  this  monograph  would  not  be  com- 
plete without  showing  how  more  general  shapes 
of  noise  spectra  can  be  considered.  Two  meth- 
ods are  presented  here,  which  are  generaliza- 
tions of  those  presented  in  Sections  10.3  and 
10.4,  respectively. 


» 1       PHILLIPS  AND  WEISS  THEORY7 

Let  g(t)  be  the  tracking  error,  and  W  (t)  the 
impulsive  admittance  of  a  smoothing  and  pre- 
diction circuit  with  smoothing  time  T.  Then 
the  error  in  prediction  due  to  tracking  error 
only,  is 

m  =  fQTQ{t  -  r)  •  W(t)  dr. 

The  impulsive  admittance  W(r)  will  depend 
also  upon  the  time  of  flight  which,  for  purposes 
of  analysis,  is  assumed  to  be  constant.  The 
mean  square  error  is  then 


V2  =  -lim  kjlLY^di 


Jo  So 


W(Tl)  •  C(n  -  T|)  •  WWdtidtt 


where 

C(x) 


lim 

2L 


g(\)  ■  g(\  +  x)  d\  •  (1) 


C(x)  is  the  autocorrelation  of  the  error  time- 
function  g  (A) . 

For  an  nth  order  smoothing  and  prediction 
circuit  V2  is  now  minimized  with  respect  to  the 
impulsive  admittance  under  the  restrictions* 


jf 


T"W(r)dT  =  C-</)"    (w  =  0.  1.  2  •••  n).  (2) 


Hence  W(r)  must  satisfy  the  integral  equa* 
tion 

jj  C(t  -  r)  •  W(r)dr  =  *0  +  *i<  +  •  ■  •  +  U" 

(0  <.  1  <.  T) 

where  the  km  are  constants  to  be  determined. 
Now,  if 

i     C(t  -  t)  •  W.m(r)dT  =  V"  (0  <•  t  <.  T) 
Jo 

(to  =  0,  1,  2  -  n)  (3) 

then 

W(t)  =  hWoir)  +  hWi(r)  +  •••  +  KWn(r).  (4) 

The  procedure  is  then  to  determine  C(x)  from 
(1),  the  Wm(r)  from  (3),  the  km  from  (2)  and 
(4),  and  finally  W(T)  from  (4).  It  may  be 
noted  that,  in  general,  every  km  will  be  a  poly- 
nominal  of  nth  degree  in  tf.  Hence  the  Wm(r) 
appearing  here  are  not  the  same  as  those  de- 
fined in  Chapter  11,  although  W(t)  should  be 
the  same  if  the  same  W0(t)  is  used  in  Chapter 
11. 

A  difficulty  of  the  theory  given  above  is  in 
the  solution  of  the  integral  equations  (3) .  This 
difficulty  is  avoided  in  the  theory  given  in  the 
next  section.  However,  the  integral  equations 
are  easily  solved  in  case  of  flat  random  noise, 
when  C(z)  is  simply  an  impulse  of  strength  K 
say,  at  x  =  0.  Then 


W, 


0  <  t  <  T. 


Since  the  strength  is  irrelevant,  it  may  be  taken 
equal  to  T  so  that  W0(T)  will  be  normalized. 

'These  follow  from  the  discussions  in  Sections  A.8 
«J  A.10,  especially  equations  (27),  (28),  (30),  and 
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For  a  linear  prediction  circuit  it  is  then  found 
that 

W(r)  =  2  (2  +  %)w0(r)  -  !  (  1  +  I  )  Wr(r). 

Putting  T  =  1  this  may  be  expressed  as 
W(t)  «  Wo(t)  +  G,(-  tf)voiM  (t) 

in  terms  of  the  G.(T)  and  Wmir)  of  Section 
11.3. 

«    SYMMETRY  OF  BEST  SMOOTHING 
FUNCTIONS 

The  theory  of  Phillips  and  Weiss  offers  the 
most  direct  proof  that  the  best  smoothing  or 
weighting  function  must  be  symmetrical,  re- 
gardless of  the  noise  power  spectrum.  The 
situation  is  that  of  minimizing  (1)  under  only 
one  of  the  restrictions  (2),  viz.,  the  normaliz- 
ing condition 

Jr  W(r)dr  -  1  (5) 

The  weighting  function  is  therefore  deter- 
mined, up  to  a  constant  scale  factor,  by  the 
condition  that 

jf  C  it  -  t)  •  W(r)dr  «  k,  (6) 

where  k  is  a  constant.  Substituting  T  —  t  for  t 
and  T  —  t  for  t,  we  have 

/C(t  -  0  •  W(T  -  r)dr  «  k.  (7) 

Since  C(  -  x)  =  C(x),  and  since  W(r)  is  de- 
termined uniquely  by  (6)  and  (5),  it  follows 
from  (6)  and  (7)  that 

W(T  -  t)  =  W(t).  (8) 


»•  GENERALIZATION  OF  ELEMENTARY 
PULSE  METHOD 

The  noise  power  transmitted  through  a  net- 
work may  be  expressed  in  the  familiar  form 

p  =  /    N(w»)  •  |r(tW)|»d« 

where  N(u>*)  is  the  noise  power  spectrum  and 
Yip)  is  the  transmission  function  of  the  net- 
work. Assuming  that  N(a>*)  is  a  rational  func- 
tion of  »*,  which  is  finite  at  all  finite  values  of 
w  including  zero,  it  is  possible  to  determine  a 


rational  function  S(p),  which  has  no  poles  on 
or  to  the  right  of  the  imaginary  axis  in  the 
p-plane  with  the  exception  of  the  point  at  infin- 
ity, and  such  that 

|S(tw)|2  =  AT(fc>2). 

It  may  be  readily  shown  that 

r-'£v<f>Y*  (0) 

where  F(t)  is  related  to  the  impulsive  admit- 
tance W(t)  by  the  operational  equation 

F(t)  =  S(p)  ■  Wit)  (10) 

The  problem  is  now  to  minimize  (9)  under  the 
restriction 

^  /    Wit)di  =  1  when  <o  >  1.  (ll) 
Let 

where 

Qip)  -  (P  +  «i)  (p  +  01)  •  •  •  (p  +  «-) 
Hip)  -  (P  +  A)  (p  +  A)  •••  (p  +  A) 

and  ft  is  of  no  consequence.  One  or  more  of  the 
a's,  but  none  of  the  pa  may  be  zero.  Since  the 
existence  of  the  integral  in  (9)  imposes  the 
requirement  that  Fit)  have  no  discontinuities 
of  higher  type  than  finite  jumps  in  the  range 
0  -  <  t  <  00,  the  continuity  conditions  on  W(t) 
in  (10)  must  depend  upon  the  difference  be- 
tween m  and  n  in  the  expressions  for  Q  (p)  and 
Rip). 

If  m  >  n,  it  is  fairly  obvious  that  Wit)  must 
be  differentiate,  in  the  ordinary  sense,  exactly 
m  —  n  times.  In  other  words,  Wit)  and  all  its 
derivatives  up  to  and  including  the  (m  —  n 
—  l)th  must  be  continuous,  but  the  (m  -  w)th 
derivative  may  have  finite  jumps.  If  m  <  n  we 
must  consider  the  introduction  into  Wit)  of 
discontinuities  of  higher  type  than  finite  jumps. 
These  discontinuities  arise  in  the  formal  ex- 
tension of  the  concept  of  differentiation  to 
functions  containing  finite  jumps. 

If  a  function  4 it)  has  a  finite  jump  of  am- 
plitude A0  at  t  =  a,  the  value  of  4,' it)  at  that 
point  will  be  indicated  formally  as  A0  •  S0(t  —  a) 
where  S0  it  —  a)  is  a  unit  impulse  at  t  =  a.  If 
*'(a  +  0)  -  *'(a  -  0)  =  A„  the  value  of  4," it) 
at  t  =  a  will  be  indicated  formally  as  A0 . 
it  -  a)  +  A,  •  8„«  -  a)  where  $,(«  -  a)  is  a 
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unit  doublet  at  t  =  a.  And  so  on,  for  higher  de- 
rivatives of  $(<). 

The  expression  (9)  is  a  minimum  under  the 
restriction  (11)  if  Wit)  satisfies  the  differ- 
ential equation 

Qip)  -Q(-P)  W(t)  =  const.  (12) 

when  0  <  t  <  1  and  Y  (p)  the  condition 

1  /**" 

2^  /    S(P)  -S(-P)  •  y (p)e*dp  -  const, 
when  0  <  t  <  1.  (13) 
The  restriction    (11)'    itself  requires  that 
TP(t)  =0  when  t  >  1,  and 
•i+ 

TT(<)<&  =  1.  (14) 


r 


Case  I.  (n  =  0) 

The  general  solution  of  (12)  contains  2m +  1 
constants  of  integration  which  are  determined 
by  (14)  and  the  2m  continuity  conditions  that 
Wit)  and  all  of  its  derivatives  up  to  and  in- 
cluding the  (m  -  l)th  must  vanish  at  t  =  0  and 
t  =  I. 


Case  II.  (n  #  0,  m  >  n) 

The  general  solution  of  (12)  contains  2m  +  1 
constants  of  integration  which  are  reduced 
to  2n  in  number  by  (14)  and  the  2(m  -  n) 
continuity  conditions  that  Wit)  and  all  of  its 
derivatives  up  to  and  including  the  (m  —  n  — 
l)th  must  vanish  at  t  =  0  and  at  t  =  1.  The 
remaining  2n  constants  are  determined  by  (IS) . 

The  left-hand  member  of  (13)  may  be  for- 
mulated by  the  method  of  residues.  The  ex- 
pression for  Yip)  should  first  be  separated 
into  two  parts  so  that 

Yip)  -  YL(P)  +  YK(p)e-> 

where  YL  (p)  and  YK(p)  are  rational  functions 
of  S(p)  S(-p)  .YL(p)e»  in  the  left-hand 
in  the  left-hand  half  of  the  p-plane  for  the  first 
part  of  Y  (p) ,  and  in  the  right-hand  half  for  the 
second  part.  Hence,  if  the  sum  of  the  residues 
of  S(p)  -  S(— p)  -  YL(p)e»  in  the  left-hand 
half  of  the  p-plane  be  donated  by  St.  and  if  the 
sum  of  the  residues  of  Sip)  •  S(—p)  •  YM(p)  ■ 
e»(t-i)  in  the  right-hand  half  of  the  p-plane  be 
denoted  by  XK>  then  the  condition  (13)  re- 
duces to 

2t  -       -  const.  (15) 


Case  III.  (n  ^  0,  m  <  n) 

The  2m  +  1  constants  of  integration  in  the 
general  solution  of  (12)  are  first  increased  to 
2n  +  1  by  appending  the  2  (n  -  m)  singularities 

kit),      «i(0,  1(0 

«o(<  -  1),  Slit  -  1),  ••■       — i  H  ~  1) 

and  then  reduced  to  2n  by  (14) .  The  remainder 
are  determined  by  (13)  or  (15). 
In  formulating 


Yip) 

it  may  be  noted  that 
£,[«„(<  -  a)]  = 

Example  of  Case  I 


W«)] 


(a  £  0)  . 


Let  S(p)  =  p".  The  differential  equation  (12) 
requires  Wit)  to  be  a  polynomial  of  degree  2m. 
The  conditions  at  t  =  0  require  it  to  have  a 
factor  tm,  and  those  at  t  =  1,  a  factor  (1  —  t)m. 
This  leaves  only  (14)  to  be  satisfied.  Hence 

Wit)  -  (2t^,1)!  [*(i  -  01-     (0  <;  t  Z  1) 

in  agreement  with  (8)  of  Section  10.8. 

Example  of  Case  II 


Sip) 


p  +  a 


P  +  0 


Let 

Then,  by 

W(t)  -  A0  +  Aie-«  +  A,f  (0  <  <  £  1) 
Hence 

Y(p)  .  — 0  +  — — —  -l 


(12) 


p  +  a       p  —  a 


_  pL-  +  dip  +  A-q  e-, 

|_p       p  +  a  p-aj 


2,  = 


Condition  (15)  is  satisfied  if 

1 

2 


CONFIDENTIAL 


APPENDIX  B  159 


where  Example  of  Case  III 

Q  «  °"  -  0i  r  .  Let  S(p)  =  1/1  +  fi.  Then,  by  (12)  and  the 

sinh  ^  +  0  cosh  rule  for  appending  singularities  in  Case  III 

Hence  W(t)  =  A0  +  AMO  +  At60(t  -  1)     (0  £  1). 


Hence 


l+Qcosha(/-i) 


In  the  limit  as    o-»0,  S(p)  -  -  _  j^T  +  — ^ —  e~ 

and  2*  =  -  ^°  ~       eK'-D  . 

W(t)  «  =-±-2   (0  <:  <  £  1)  .       Condition  (15)  is  satisfied  if 

1  +   1     &i  A 

f   62  +  0  A\  m  At  — 


0 

In  terms  of  expressions  (12),  Section  11.3. 

Hence 

W(t)  =  Wt(t\  ±  k™l(t)        (0  il£l)  ,  +         +  6o(t  -  1) 

where  k  =  1/6  [£'/ (2  +  £)].  This  is  reminis-         w,q  m  f   (0  £  f  £  1) 

cent  of  Stibitz's  results  mentioned  in  Section  2 


10.3.  1  +  -J 
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MEMORANDUM  FOR  FILE 


Introduction 

The  transient  response  behavior  of  a  long  chain  of 
invariable  four-terminal  networks  connected .unilaterally  in 
tandem  is  of  primary  importance  in  the  design  of  cross-country 
wire  communication  systems,  since  the  successful  operation  of 
such  equipment  depends  upon  the  rapid  damping  of  transients 
caused  by  suddenly  applied  inputs. 

While  the  emchasis  in  the  memorandum  will  be  directed 
toward  coaxial  systems  cons'is-fcing  of  self-regulating  ^repeaters 
spaced  at  3-7  mile  intervals  and  spanning  distant  points,  the 
results  are  of  a  more  general  nature  and  would  apply,  with 
obvious  modifications  and  corresponding  interpretations,  to  any 
configuration  involving  a  large  number  of  four-terminal  linear 
invariable  networks  connected  unilaterally  in  tandem. 

It  will  be  shown  that  there  are  two  fundamentally 
different  types  of  transient,  response  possible  depending  upon 
the  gain  characteristic  of  the  transfer  ratio  of  the  individual 
four-terminal  linear  networks  comprising  the  system.    The  first 
type  of  response  while  satisfactory  is  difficult  to  achieve  in 
practice  because  of  the  stringent  requirements  on  the  gain 
characteristic  of  the  transfer  ratio.    The  second,  a  case  often 
encountered  in  practice,  will  be  shown  to  be  unsatisfactory  in 
general  since  it  leads  to  build-up  and  overloading  in  any 
physical  system  comprising  a  large  number  of  such  networks. 
However,  a  guiding  design  orinciple  will  be  suggested  which, 
it  is  believed,  will  enable  us  to  minimize  the  worst  of  the 
effects,  and  make  the  successful  operation  of  a  system  of  the 
type  envisaged  here  possible. 

This  memorandum  is  divided  into  two  parts.    In  the 
first  the  problem  is  defined  physically  and  then  formulated 
mathematically.    Following  this,  the  history  of  the  problem  is 
discussed  briefly  after  which  the  new  results  are  summarized.- 


Finally,  this  part  concludes  with  a  discussion  of  their  inter- 
pretation and  implications  for  the  coaxial  system.  The  second 
part  presents  the  detailed  mathematical  arguments  which  led  to 
the  new  results  of  part  one. 


PART  I 


Statement  of  the  Problem 

The  analysis  in  this  memorandum  is  directed  toward 
the  understanding  of  certain  anomalous  effects  which  a  long 
chain  of  self-regulating  telephone  repeaters  may  exhibit  at  its 
output  when  the  input  end  of  the  chain  is  subject  to  a  transient 
disturbance  (Cf.  Figure  1). 

The  gain  settings  of  the  repeaters  in  such  a  chain 
are  usually  controlled  by  the  level  of  a  pilot  frequency  some- 
where in  the  communication  band  and  the  regulation  is  designed 
to  compensate  for  low  frequency  phenomena  (up  to  approximately 
one  cycle  per  second)  such  as  the  diurnal  Change  in  line  resis- 
tance.   The  repeaters  in  the  chain  are  normally  absolutely 
stable  devices  so  that  any  transient  which  is  presented  to  the 
input  of  any  one  of  them  will  be  evanescent  in  time  at  the 
output  of  that  repeater. 

Since  transients  are  not  damped  out  instantaneously 
even  in  absolutely  stable  devices,  a  transient  disturbance  at 
the  input  to  the  first  repeater  in  such  a  chain  will  be  pro- 
pagated down  the  chain.     It  has  been  experimentally  observed 
that  under  certain  conditions  the' maximum  amplitude  of  a  tran- 
sient disturbance  may  increase  as  the  disturbance  is  propagated 
from  one  repeater  to  the  next  and  in  some  cases  there  may  be 
many  oscillations  of  sufficiently  large  amplitude  to  render  the 
system  inoperative  because  of  prolonged  over-loading. 

If  the  entire  chain  from  its  input  to  its  output  end 
is  considered  as  a  whole,  the  chain  does  behave  then  in  many 
respects  like  an  unstable  non-linear  device  in  spite  of  the 
fact  that  each  repeater  in  the  chain  is  absolutely  stable. 

Since  it  is  obvious  that  the  above  type  of  behavior 
is  at  best  undesirable  in  a  cross-country  link,  it  is  necessary 
that  its  cause  be  thoroughly  understood  and  that  all  .possible 
steps  be  taken  either  to  suppress  it  or,  if  this  is  not  possible, 
at  least  to  minimize  its  effects. 


Although  it  is  not  reasonable  to  expect  that  transient 
oscillations  can  be  kept  from  propagating  down  the  line,  or  that 
it  is  possible  to  isolate  the  line  from  all  transient  disturbances 
it  is  reasonable  to  seek  a  means  of  guaranteeing  that  the  tran- 
sients that  are  propagated  down  the  line  will  never  possess 
amplitudes  that  exceed  the  magnitude  of  the  original  disturbance 
or  to  seek  a  way  to  guarantee  that  the  maximum  response  of  the 
transient  oscillations  will  occur  so  shortly  after  the  initial 
disturbance  that  physical  apparatus  will  be  incapable  of  follow- 
ing or  distinguishing  it  from  the  unavoidable  initial  disturbance. 
A  way  of  guaranteeing  the  first  of  these  will  be  discussed  at 
length  and  a  suggestion  will  be  made  which  it  is  felt  will 
guarantee  the  second,  although  no  rigorous  proof  of  this  last 
fact  has  yet  been  given. 

Fig.  2  represents  a  schematic  drawing  of  a  typical 
satisfactory  type  of  transient  response  which  might  result  from 
a  unit  step  input  to  the  first  unit  of  Fig.  1.    Fig.  3,  on  the 
other  hand,  represents  a  schematic  drawing  of  a  typical  unsatis- 
factory type  of  transient  response  which  could  result  from  the 
same  input  to  a  system  of  the  type  of  Fig.  1  which  had  different 
characteristics.    Briefly  then,  the  problem  to  be  discussed  is 
that  of  determining  the  relationships  between  the  network 
characteristics  and  the  transient  response  for  networks  of  the 
form  of  Fig.  1. 

Mathematical  Formulation  of  the  Problem 

A  sudden  change  in  level  in  the  pilot  freauency 
before  the  n-th  repeater  results  in  the  modulation  of  this 
frequency,  changing  it  from  its  normal  form 


A  sin  <i>  t 

C 

to 


A  sin  u>  t  [1  +  f(t)  ] 
c 

where  f(t)  represents  the  modulation  introduced  by  the  tran- 
sient. 

After  passage  through  the  n-th  repeater,  this  last 
expression  is  transformed  into 

A  sin  (u>0t  +  <p)   [1  +  g(t)], 
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where  the  repeater  and  regulator  have  (possibly)  changed  the 
carrier  by  the  addition  of  the  phase  angle  q>  and  have  modified 
the  original  envelope  A[l  +  f(t)]  into  A[l  +  g(t)]. 

It  is  clear  that  from  the  standpoint  of  regulation 
it  is  sufficient  to  limit  discussion  to  the  transformation 
of  f (t)  into  g(t) .* 

The  exact  relationship  between  f(t)  and  git),  of  course, 
depends  upon  the  characteristics  of  the  repeater-regulator  cir- 
cuits which  are  in  general  non-linear.    However,  for  small  signal 
inputs  their  behavior  may  be  satisfactorily  represented  by  that 
obtained  from  a  linear  invariable  four- terminal  network.  Thus, 
the  chain  of  self-regulating  repeaters  may  be  replaced,  for  the 
purpose  of  mathematical  analysis,  by  a  chain  of  linear  invariable 
four-terminal  networks  having  a  common  transfer  ratio  y(p).  Thus, 
the  blocks  of  Fig.  1,  will  be  idealized  as  being  such  linear  four 
terminal  networks  throughout  the  analysis. 

Because  regulation  is  designed  to  compensate  for  low 
frequency  phenomena,  certain  characteristics  that  y(p)  should 
possess  are  known  a  priori:  namely; 

"    (1)     y(p)  must  represent  a  high-pass  system.    That  is,  . 
y(p)  — >  1  as  p  — >  oo 

(2)     y(0)  should  be  zero  if,  in  the  terminology  of  servo 
theory,  there  is  to  be  no  static  error. 

■ 

In  terms  of  y(p),  the  design  of  a  self-regulating 
system  reduces  to  two  problems: 

(I)     Given  y(p),  to  calculate  the  transient  behavior  of 
the  chain  of  self-regulating  repeaters, 

(II)     The  design  of  a  system  having  a  y(p)  which  leads 
to  satisfactory  transient  behavior. 

The  rest  of  the  memorandum  will  be  concerned  largely 
with  the  first  of  these.  The  calculations  will  be  carried  out 
in  general  terms  and  the  different  types  of  possible  responses 
will  be  described  in  terms  of  the  characteristics  of  y(p), 

*  Transit  time  between  repeaters  is  neglected  throughout  this 
memorandum.    More  exactly,  we  choose  a  different  origin  of  time 
at  each  repeater,  so  that  the  transit  time  does  not  appear  ex- 
plicitly in  the  formulae. 
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Mathematically  the  problem  discussed  in  this  memoran- 
dum can  be  formulated  as  follows:     If 'y(p)  represents  the  common 
steady-state  transfer  ratio  of  the  four-terminal  linear  units 
shown  connected  in  tandem  in  Figure  1,  the  output  voltage  response 
of  the  n-th  unit  V(t)  is  given  by  the  inverse  Laplace  integral: 


vn(t)  =  ^ 


-C  +  1CD 


c-ioo 


y(p)n  epH0(p)  dp 


where  V  (p)  represents  the  spectrum  of  the  input  voltage, 
o 

For  an  impulsive  input  of  intensity  YQ  applied  at 
time  t  =  0, 

=  V 

For  a  step  function  input  of  height  VQ  applied  at 
time  t  =  0, 

VQ(p)   =  VQ/p. 

- 

Specifically,  this  memorandum  will  be  devoted  to  the 
study  of  the  behavior  of  Vn(t)  for  large  values  of  n. 

Four-terminal  networKS  are  normally  classed  as  low-, 
band-,  or  high-pass  depending  upon  the  character  ofly(iw)|. 
Typical  examples  of  I  y(  ico)  I   are  shown  in  Figure  4a,  in  which, 
following  the  usual  practice,  ly(iu)l   has  been  normalized  to  be 
unity  at  a)  =  0  in  the  low-pass  case;  at  o>  =  wo>  (the  mid-band 

frequency),  in  the  band-pass  case;  and  at  to  =  oo  in  the  high-pass 
case. 

From  the  viewpoint  of  the  asymptotic  behavior  of  the 
system  in  Figure  1,  it  is  convenient  to  modify  this  classifica- 
tion somewhat  when  speaking  of  the  over-all  gain  characteristic, 
|y(iu))|n,  of  the  transfer  ratio  of  a  system  comprised  of  n  units. 
For  sufficiently  large  n,  it  is  clear  that  |y(iu)|n  would  lead 
to  curves  of  the  type  shown  in  Figure  4b  corresponding  to  the 
low-pass,  band-pass  and  high-pass  curves  of  Figure  4a.  Thus, 
for  sufficiently  large  n,  the  gain  curves  B*,  C«,  and  D*  of 
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Figure  4b  are  seen  to  exhibit  the  type  of  behavior  normally 
associated  with  a  band-pass  characteristic.    A'*  and  E*y  °n  the 
other  hand,  exhibit  behavior  of  the  type  normally  classified  as 
low-pass  and  high-pass.    For  these  reasons,  the  terms  low-,  and 
high-pass  will  henceforth  be  reserved  for  those  gain  character-  , 
istics  which  are  always  less  than  their  values  at  u  =  0  and 
a)  =  oo  ,  respectively.    The  termj  band-pass,  will  be  used  to 
cover  all  other  cases;  namely,  those  in  which  ly(ia>)|  possesses 
one  or  more  maxima  at  finite  frequencies,  the  values  of  which 
exceed  the  values  of  ly(iu))|   at  both  zero  and  infinity. 

History  of 'the  Problem 


Several  people  have  considered  this  problem  in  the 
above  mathematical  form.    Before  proceeding  to  a  discussion  of 
the  results  of  the  general  theory,  it  will  be  instructive  to 
consider  a  few  illustrative  examples  of  their  results. 


Let 


(2)  = 


y(p)  =  p/(p+D 


The  gain  characteristic  is  clearly  of  the  high-pass 
type  and  satisfies  (1)  and  (2)  of  Page  6.    If  the  input  voltage 
is  a  unit  step,  then,  by  the  theorem  of  residues, 


,n-l 


d(t) 


n-1 


i      '  — 'p=-i 


where  L-  ,(t)  denotes  the  Laguerre  polynomial  of  degree  (n-2). 
A  plot  of  Vn(t)  for  n  =  1,  2,   . . . ,  10  is  shown  in  Figure  5.  It 
is  known  that  for  large  n 


Lit)  =  J=  ?  (nt)-1/4  cos 

11  V  TT 


2(nt)1/2  -  g 


*This  examde  was  first  treated  by  L.  A.  HacColl  (MM-39-325<-166)  , 
9/11/39  and  W.  H.  Wise  ( UK- 38-343-22 ) ,  8/2/38.     The  above 
treatment  follows  that  of  LlacColl. 


where  =  is  to  be  interpreted  as  "asymptotically  equal  to." 
Thus 


t 


A  plot  of  the  approximate  "envelope" 


t 

1    e  2  (nt)'1/4 

is  given  for  n  =  50,  100,  150,  200,  and  250  in  Figure  6. 

The  response  in  this  case  is  seen  to  be  both  ampli- 
tude and  frequency  modulated,  the  "instantaneous  frequency"  in 
the  sense  of  frequency  modulation  theory  being  given  by 


u '  m  ^  (2(nt)1/2)  «  A 


while  the  envelope  of  the  amplitude  modulation  is  approximately 
exponential.    In  particular,  the  type  of  behavior  found  here 
can  be  considered  satisfactory  since  there  is  no  tendency  for 
the  magnitude  of  the  largest  overshoot  to  increase  without  limit 
as  the  number  of  repeaters  is  increased.    As  will  be  shown 
later,  this  type  of  behavior  is  typical  of  any  network  having 
a  high-pass  characteristic  in  the  generalized  sense  of  that  term 
as  it  has  been  defined  above. 

In  MM-40-3500-92  dated  10/14/1940,  J.  G.  Kreer  and 
J.  H.  Bollman  concluded  that  the  appropriate  y(p)  for  a  self- 
regulating  repeater  employing  a  directly  heated  thermistor 
element  in  the  control  device  was  given  by 

It  should  be  observed  that  for  o 4=  0  this  transfer 
ratio  does  possess  static  error.    L.  A.  MacColl  in  MM-40-130-270 
treated  this  case  for  Id  <  1  and  found  that  the  system  exhibited 
essentially  the  same  type  of  satisfactory  behavior  as  that 
discussed  above. 
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(2)  A  slightly  more  complicated  example  is  given  by 


y(p)  =  P<P  +  °] 

(p  +  D2  *  ' 

It  is  easily  seen  that  for  a  <  vTT,  I  y(  iu>)  I   is  a  high-pass 
jharacteristic  in  that  I  y(  ico)  |  <  1  for  all  finite  to  and 
y(  io>)  I  — >  1  as  co  — >  oo  .    On  the  other  hand,  if  ft  >  -/IT, 
y(io))|   possesses  a  maximum  greater  than  1  at  some  finite 
frequency.     ly(ito)[   is  illustrated  by  curve  I  in  Figure  7  for 
a  =  1.4  (high-pass)  and  by  Figure  8  for  c  =  2  (band-pass). 
The  response  Vn(t)  to  a  unit  step  function  is  shown  in  Figures 

9  and  10  for  these  two  cases  with  n  =  1,2  9.    The  character 

of  the  response  is  seen  to  be  of  a  radically  different  kind 
for  these  two  values  of  a. 

For  a  =  1.4  the  response  is  seen  to  be  of  the  same 
type  as  that  encountered  in  the  first  example.    For  a  =  2,  on 
the  other  hand,  it  seems  to  represent  an  oscillation  in  which 
the  magnitude  of  the  largest  overshoot  is  increasing  without 
limit  as  n  tends  to  infinity.    Later  it  will  be  shown  that 
this  is  in  fact  the  case  and  that  satisfactory  operation  is 
impossible  for  a  large  number  of  repeaters  in  this  case. 

From  this  and  other  considerations  L.  A.  MacColl 
conjectured  that  a  necessary  and  sufficient  condition  that 
the  response  V  (t)  be  bounded  for  all  n  was  that    the  transfer 

ration  y(p)  have  no  net  gain  at  any  frequency.  Mathematically 
expressed,  a  necessary  and  sufficient  condition  that 

I Vn(t) I  <  M  for  all  n, 
where  M  is  independent  of  n  and  t,  is  that 

(M)  I  y(  ito)  I  <  1  for  all  real  frequencies  to. 

Physically,  the  condition  on  y(ito)  prevents  the  transfer  ratio 
]y(ito)|n  for  a  system  using  n  units  from  having  a  tremendous 
gain  at  any  particular  frequency. 


This  case  was  also  treated  by  L.  A.  MacColl,  but  no  memorandum 
on  it  was  ever  written. 


In  one  sense  this  memorandum  could  be  summarized  as 
a  proof  of  this  conjecture.    In  particular,  a  direct  proof  of 
the  necessity  of  MacColl's  condition  (M)   is  given  in  the  second 
part.    The  remainder  of  that  part  is  devoted  to  an  indirect 
proof  of  the  sufficiency.     The  argument  consists  in  exhibiting 
the  two  types  of  possible  responses;  the  first  being  that 
associated  with  a  y(p)  satisfying  MacColl's  condition  and  that 
second  that  resulting  from  a  y(p)  which  violates  it  at  one  or 
more  frequencies. 

Statement  of  Results 


The  detailed  results  of  the  sufficiency  argument 
are  discussed  conveniently  in  terms  of  the  generalized 
characterization  of  high-,  band-,  and  low  pass  y(p)'s  as 
given  on  page  8,    The  results  will  be  taken  up  in  that  order. 

High  Pass 

In  terms  of  the  above  classification,  the  class  of 
high  pass  y(p) 's  consists  of  just  those  functions  which  satisfy 
MacColl's  condition  and  are  therefore  those  from  which  a  satis- 
factory response  could  be  expected.    For  the  y(p)fs  in  this 
class,  it  is  clear  on  physical  grounds  that  the  maximum  contri- 
bution to  the  response  V  (t)  of  equation  (1)  will  come  from  the 

large  values  of  |w|   since  for  these  values  of  I  u|  ,  |y(  io))|n  >  1 

while  for  all  other  values  of  I  co|  ,  I  y(  iu>)  I  n — >  0.    Using  the 
first  three  terms  of  the  Laurent  expansion  of  y|  iu>|   about  u  =  oo  , 
one  finds: 

(5)*  y(iu))  =  1  +  S_i  +  \  , 


(6)  ly(iu)l  ~ 


,      a2  +  2b 

1  +  — s — 


0) 

1/2 


to 


(7)  Angle  y  (iuj  Sf.g  . 


*  It  is  assumed  that  a  >  0,  b  <  0,  and  that  2b  +  a    <,0.  These 
assumptions  correspond  to  a  second  order  maxima  at  I  u)l   ==  oo  and 
to  a  monotonic  decreasing  phase  function  for  y(p)  as  I  oo] — >  oo  . 
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If  these  approximations,  which  are  valid  for  I  to|  sufficiently 
large,  are  introduced  into  equation  (1),  it  can  be  shown  that 
the  principal  contribution  to  V  (t).for  a  unit  step  input  is 
given  by: 

Vn(t)  *  (n)-1^  (nat)-lA  exp  |  jfi!j±-^>tj  cos  (EvHSt 

This,  with  a  suitable  interpretation  of  the  constants 
a  and  b  is  seen  to  be  of  the  same  general  form  as  the  response 
obtained  by  liacColl  for  y(p)  =    p/(p  +  1)  as  given  by  equation  ( 
Just  as  in  that  example  the  response  is  both  frequency  and  ampli 
tude  modulated.     The  instantaneous  frequency  of  oscillation  is 
again  given  by 

• 

The  gain  for 


y(p)   =  P(P  i 

(P  I  D2 

is  shown  on  curve  I  of  Figure  11.     Curve  II  of  this  figure 
represents  ly(iw)|100  for  this  y  (p').    For  this  example  and 
n  =  100,  the  true  gain  |y(iu)|100  ana  the  gain  approximation 
resulting  from  equation  (6)  are  indistinguishable  on  the  scale 
of  Figure  11. 

The  corresponding  phase  characteristic  for  y(p)100 
is  plotted  on  Figure  12  where,  for  reasons  which  will  appear 
in  Part  II,  the  actual  frequency  has  been  replaced  by 

w»  =    ^_  . 

-✓n 

Again,  on  the  scale  of  Figure  12  the  actual  phase  is  indis- 
tinguishable from  the  approximation  resulting  from  equation  (7). 
Figs.  7  and  13  present  the  same  information  for 


y(p)  =2l£_^il 

(p  +  ir 


and  n  =  100. 
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Again  the  agreement  between  the  actual  phase  and  the  approxi- 
mation is  excellent.    However,  there  is  a  considerable  error 
in  the  gain  approximation  for  small  I  <d|  ►    This  large  error  is 
unquestionably  due  to  the  fact  that  the  value  o  =  1.4  is  near 
the  critical  value  a  =  ST  at  which  the  characteristic  changes 
from  high-pass  to  band-pass. 

Agreement  with  the  above  asymptotic  formula  can  of 
course  be  obtained  by  increasing  n  sufficiently.  Alternately, 
for  n  =  100,  a  better  approximation  to  the  gain  can  be  obtained 
by  writing 


y(  iu)  =  1  + 


a  i 

.0) 


b 

~2  + 

CO 


and 


ly(iu)l  = 


l  + 


2b  +  a 


2d  +  b    +  2ac 


CO' 


'  I/2 


This  approximation  leads  to  a  curve  which  is  indistinguishable 
from  that  of  FyU^)!100  in  Figure  7.    With  this  approximation, 
one  finds  the  following  expression  for  VQ(t)  when  the  input 
is  a  unit  step  function 

* 

V  (t)  *  (nj^Cnat)-1/4  cos  (2^nat  JL  )  exp((a^2bU) 


(         (2d  +  b2  +  2ac)t2  ) 

i1  +  2^  ■! 

(  ) 

This  expression  is  seen  to  approach  that  given  by  equation  (8) 

as  n   >  co  .    Thus  one  can  conclude  that  the  response  will 

always  be  satisfactory  if' y(p)  belongs  to  the  class  of  high-pass 
characteristics . 

Band-Pass  Case 

MacColl»s  condition  is  clearly  violated  whenever  ly(iu))| 
has  one  or  more  relative  maxima  greater  than  1  at  finite  fre- 
quencies.    For  simplicity  the  case  where  |y(iw)l   has  only  one  suet 
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maxima  at  u  =  to0  will  be  treated  first.  It  will  furthermore  be 
assumed  that  this  maximum  is  of  the  second  order;  i.e. 


d2 
dw2 


^  0. 


Under  these  conditions,  it  is  physically  clear  that  the  maximum 
contribution  to  the  response  V  (t)  as  given  by  equation  (1)  will 
be  due  to  those  frequencies  near  o>o,  at  which  I  y(  iu>)  I  possesses 
its  maximum,  since  as  n  increases  ihis  region  becomes  increasing 
more  important  than  all  the  rest.    It  is  also  clear  that  the  time 
of  maximum  response  will  be  given  by  the  delay  time  experienced 
by  the  frequency  wQ  in  passing  thru  the  network.    This  is  known 

to  be  given  by.  tQ  =  -  n  B'(w0)  where  Bf(u0)  denotes  the  slope  of 

the  phase  characteristic  B(u>)  in  the  expression 


(10) 


y(  iw)  =  A(uj)  exp  (  iB(u) )  . 


If  A(to)  and  B(u>)  are  expanded  in  a  Taylor's  series  about  u>  =  coq 

and  terms  up  to  the  second  order  retained,  it  can  be  shown  that 
the  response  to  a  unit  impulse  function  is  given  by 


(ii)  vn(t)  =  A(^Jn 

VZn 


G(u0)  exp  ( 


-(t-to)cH(0)n) 


o/  )  cos  |u>Qt  +  nB(uQ) 


where 


0(»0)   -  n-V8j 
( 


(  — 


A"("0) 


-1/4 


*  CB»»(w0)n 


H(«0) 


A'  '(cup) 


(I  A"l«Q) 


2> 


>  0 
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(B"(w0)  A{« J) 
io((,o)  =  arctanj     2a,,([Uq)  ) 


) 


tQ  =  -nB(wQ) . 


Thus  V  (t)  can  be  interpreted  as  an  amplitude  modulated 
n 

wave  with  an  envelope  proportional  to  the  Gauss  error  curve 

(-(t-tj2  ) 
e*Pj       2n  H^o)j 

with  a  standard  deviation  given  by 


( 
( 

(  n 
( 


( 


(A 


)2 


-  )l/2 


(B"(U)Q))2 


J  ) 


The  standard  deviation  cr  is  of  course  a  convenient  measure  of  the 
duration  of  the  disturbance.    The  maximum  response  occurs  for  time 
t    =  -  n  B'  («  )  at  which  time  the  amplitude  is  proportional  to 


A("0)n 

.  ✓IE 

Thus  if  A(w  )  >1,  the  maximum  response  will  represent  a  value 
which  is  very  large  compared  with  unity,  the  magnitude  of  the 
original  disturbance,  if  n  is  large.    This  would  force  any  system 
involving  vacuum  tubes  to  overload  if  n  were  sufficiently  large. 

These  properties  are  summarized  in  Figures  (14)  and 
(15).    Figure  (14)   is  a  plot  of  the  response  for  values  of  t 
near  t     for  a  few  values  of  n  for  the  example  given  by  equation 

(4)  where  a  =  2.    Figure  (15)   is  a  plot  of  the  maximum  response 
for  a  few  values  of  n  for  different  values  of  the  parameter  a. 

It  should  be  remarked  that  the  above  approximation  to 
the  gain  which  was  obtained  by  keeping  only  the  first  two  terms 
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of  the  expansion  of  A(w)  about  go  =  u)Q  could  only  be  expected  to 

be  a  reasonable  one  for  fairly  large  values  of  n,  since  it 
represents  a  usually  unsymmetric  gain  characteristic  by  a 
symmetric  function.    A  better  or  second  approximation  can  be 
obtained  by  using  three  terms  of  the  Taylor's  expansion  instead 
of  two.    Just  as  in  the  high  pass  case,  the  retention  of  this 
extra  term  gives  rise  to  a  second  term  in  the  expression  for 
Vn(t)  but  it  does  not  fundamentally  alter  the  characteristics 

of  the  response  since  the  correction  term  vanishes  for  t  =  t  , 

at  which  time  the  response  is  still  a  maximum,  with  the  same 
amplitude  as  before.    Its  only  effect  is  to  take  cognizance  of 
the  unsymmetrical  character  of  the  gain  characteristic  A(w)  and 
to  change  the  resulting  response  envelope  to  an  unsymmetrical 
one.    Of  course,  it  also  modifies  the  phase  of  the  oscillation 
inside  the  envelope  in  a  complicated  way  without  changing  the 
fundamental  frequency  of  oscillation.  • 

• 

For  these  reasons  and  because  of  the  complexity  of  the 
resulting  expression,  it  will  not  be  written  down  here  explicitly 
although  the  explicit  approximation  to  the  gain  A(w)  will  be 
discussed  in  Part  II. 

The  two  approximations  to  the  gain  are  illustrated  for 
equation  (4)  with  a  =  2  in  Figure  16  for  n  =  100,    In  this  case 

.    .       |u)|-/)2  +  4 
A(u)    =   5   • 

(iT   +  1 

As  can  be  seen  from  the  figure,  the  second  approximation  does  in 
fact  represent  A(w)  over  the  significant  range  of  frequencies 
near  -w    from  which  it  can  be  concluded  that  the  response  will  be 

unsatisfactory.    Figure  (14) r  previously  referred  to,  furnishes 
a  picture  of  the  envelope  response  as  obtained  from  the  first 
approximation. 

In  the  event  that  A(^)  takes  on  its  maximum  value  at 
more  than  one  place  in  the  finite  frequency  range,  it  is  clear 
that  the  above  results  can  be  generalized  as  follows: 

Let  V  . (t)  be  the  response  of  the  form  given  by  equation 
(11)  due  to  a  maximum  at  co  =  w-  ,    Let  the  time  of  maximuma  response 


-  15 


from  this  maximum  be  denoted  by  t.  =  -nB*(wj_)»    Then  the  total 
response  is  clearly  given  by  the  expression 

k 

vn(t)  =  Z  V  .(t)., 
n         i=1  ni 

if  there  are  k  relative  maxima*    Unless  the  values  of  A(w)  at 
the  points  u)  =        are  nearly  the  same,  it  is  also  clear  that 

only  those  terms  of  the  above  sum  which  correspond  to  the  largest 
maxima  of  A(w)  will  be  of  significance.  . 

The  band-pass  case  is  also  discussed  briefly  for  unit 
step  inputs  in  Part  II. 

Low  Pass  Case 

Since  the  low-pass  case  differs  from  the  band  pass  case 
only  in  that  A(w)  has  its  maximum  for  w  =  0  instead  of  at  u  =  uQ 

^  0  the  results  of  the  two  are  very  similar.    The  results  in 
the  low-pass  case  are  simpler  because  it  will  be  recalled  that 
B(w)  (as  defined  by  equation  10)  is  an  odd  function  of  10  for  any 
physical  network,    This  forces  both  B(0)  and  B'^(0)  to  be  zero  so 
that  for  an  impulsive  input  one  obtains  the  simple  formula; 


(12)     j  It)  Vim  In"3/2 
n  -/2n  ( 


A"(0) 


-1/2)  (t-tQ)2  A(0)) 

J  exp  [   2n  A'*  (0)j 


This  result  corresponds  to  the  well-known  formula  from 
transmission  line  theory  for  non-distortionless  lines. 


Remarks 

From  the  practical  viewpoint  the  above  results  have  the 
following  implications  for  communications  systems  such  as  a 
cross-country  coaxial  telephone  system  employing  self-regulation 
repeaters  spaced  at  intervals  of  a  few  miles. 

(1)     If  the  transfer  characteristic  of  each  individual 
network  is  of  the  high-pass  type  (in  the  sense  in  which  this  term 
has  been  used  above)  then  the  transient  response  will  never  exceed 
the  initial  value  of  the  disturbing  input  voltage  and  it  will 
be  damped  out  so  that  the  operation  of  the  communication  system 
would  generally  be  considered  satisfactory. 
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(2)    If  the  network  is  not  of  the  high-pass  type,  the 
usual  practical  case,  and  there  is  any  net  gain  in  the  system, 
which  is  peaked  at  u>0  then  for  even  a  small  number  of  units  the 

response  will  exceed  the  initial  input  at  the  time  given  by 


tQ  =  -  nB'(u>0) 

where 

A'(u)0)  =  0 

and  if  the  number  of  units  is  sufficiently  large  the  output 
from  the  n-th  unit  will  be  large  enough  to  cause  severe  over- 
loading. 

At  first  glance  these  implications  are  not  promising 
and  seem  to  indicate  that  the  operation  of  a  cross-country 
system  involving  several  hundred  repeaters  and  regulators  would 
be  extremely  difficult,  since  , the  only  satisfactory  characteristic 
is  difficult  to  attain  in  practice.    However, "practically  the 
ideal  characteristic  which  is  high  pass  can  be  approached  in  the 
sense  that  the  peaked  frequency  can  be  made  very  large.  Thus 
the  maximum  response  may  occur  so  soon  after  the  initial  distur- 
bance that  the  physical  system  would  not  be  able  to  follow  it  or 
to  distinguish  it  from  the  initial  disturbance  which  in  many 
cases  would  be  large  enough  to  cause  momentary  overloading  of  the 
system. 

Moreover,  it  is  ah  experimental  fact  that  in  the  design 
of  feedback  regulator  characteristic  forcing  the  peaked  frequency 
higher  reduces  the  size  of  the-  peak  which  in  turn  will  permit  the 
use  of  a  larger  number  of  regulators  in  the  system. 

If  this  is  done,  the  time  of  maximum  response,  tQ  = 

nB'(^0),  will  be  small  since  B'(a))  in  general  is  small  for  large 

u).    Assuming  that  the  effects  of  the  maximum  response  have  been 
treated  in  this  way,  it  is  natural  to  inquire  into  the  type  of 
response  which  will  result  for  finite  values  of  t  >  tQ. 

If  one  examines  the  gain  characteristic  curve  of  the 
type  shown  in  Figure  (7),  it  is  clear  that  for  frequencies  less 
than  some  frequency  u>,  slightly  less  than  the  peak  frequency  u>0, 
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the  shape  is  fundamentally  like  that  of  the  high-pass  case. 
Remembering  that  the  phase  delay  of  a  frequency  through  a  linear 
network  is  given  by  the  slope  of  phase  characteristic  at  that 
frequency,   it  is  clear  that  the  response  for  values  of  t  greater 
than  tQ,  the  time  of  maximum  response,  will  come  from  the  fre- 
quencies less  than  uQ,  since  the  phase  slope  characteristic  is 

large  for  small  frequencies  and  small  for  large  frequencies. 
Now  if  it  is  assumed  that  the  phase  characteristic  nB(u>)  is  a 
monotonic  decreasing  function  of  to,  it  is  clear  that  the  'function 
(nB(w)  +  tot)  will  always  be  stationary  at  an  arbitrary  frequency 
u>,  provided  that  t  is  given  a  suitable  corresponding  value.  Thus, 
it  is  reasonable  to  expect  that  the  response  for  t  »  tQ*  will 

exhibit  the  same  type  of  character  as  that  obtained  in  the  high- 
pass  case  discussed  above.     This,  it  will  be  recalled,  is  both 
frequency  and  amplitude  modulated  with  an  envelope  which  decreases 
approximately  exponentially.    Thus,  under  these  circumstances  it 
seems  reasonable  to  supoose  thet  satisfactory  operation  of  the 
communication  link  could  be  obtained. 

To  recapitulate,  the  most  practical  design  for  any 
system  of  the  type  envisaged  in  Figure  1,  from  the  viewpoint  of 
satisfactory  transient  response  involves  approaching  the  high- 
pass  characteristic  as  closely  as  possible  by  making  the  gain 
characteristic  of  the  transfer  ratio  peak  at  as  high  a  frequency 
as  is  practicable  and  by  keeping  the  phase  slope  characteristic 
monotonic  for  all  smaller  frequencies. 


PART  II 


Mathematical  Discussion 

Theorem  I.    A  necessary  condition  that  the  response  Vn(t)  from  a 

chain  of  n-four  terminal  linear  invariable  networks  sub.ject  to~a" 
unit  step  input  function  have  a  common  finite  bound  for  all  n  is 
that  the  transfer  ratio  y(p)   satisfy  the  relation- 

(M)  |y(iu))|<  1  for  all  real  values  of  w. 


*  A  different  type  of  expansion,  valid  for  any  fixed  t  or  n  — >  co 
is  discussed  at  the  end  of  Part  II. 


By 
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Proof:    By  hypothesis 


Iv  (t)|<  M    for  all  n  where  M  is  independent  of  n  and  t 
n  ■ 


,00 


so  that 


Vn(p)  =  J    e"pt  Vn(t)  dt 


n  VP) 
y(p)n  .  ,  pVn(p) 


ly(p)ln  -  ipl|f°  e~pt  vn(t)  dt| 


lvn(t)l  dt 


<  I  pi  M  J  I 


If  p  =  c  +  iw  and  if  c  >  0,  then 


'  2  'c 

C      +  Od 


M 


so  that 


log  (y^kllog^V/ 


Thus,  in  the  limit  as  n  —  od  ,  it  follows  that  for  any 
p  with  a  positive  real  part 

log  I  y(p)  !<  0 
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and  hence 

ty(p}]<  i 

Since  this  relation  holds  everywhere  in  the  right-hand  half 
plane,  it  follows  from  simple  continuity  considerations  that 
the  maximum  of  ly(iw)|,  never  exceeds  1,  Thus 


ly(iw)l  <  l 

as  was  to  be  shown. 

The  remaining  discussion  will  be  devoted  to  the 
characterization  of  the  different  types  of  possible  responses 
and  will,  at  the  same  time,  furnish  an  indirect  proof  of  the 
fact  that  the  condition  (M)  on  y(p)   is  also  sufficient. 

High  Pass  Case  -  Unit  Step  Input 

If  the  networks  comprising  the  system  shown  in 
Figure  1  possess  a  transfer  ratio  having  a  high  pass ^ gain  char- 
acteristic in  the  sense  defined  above,  and  if  one  writes  , 

y(iu>)  =  A(u)  eiB(u)) 

then  the  gain  function  A(«)  satisfies  the  two  conditions 

(A)  A(w)  <  1  for  all  finite  frequencies  u». 

(B)  Lim    A(w)   =  1 

to    •-*  00 

Under  these  conditions  it  is  clear  that,  for  sufficiently  large  n, 
the  main  contributions  to  Vn(t)  will  be  due  to  the  high  values  of 

I  u)|  .    For  convenience, .  Vn(t)   is  written  here  in  slightly  dif- 
ferent form 

Vn(t,  -He  \l  fA(.,»eW«'-'  -^ 
("J0  ) 
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For  large  values  of  I  w|  ,  all  physical  transfer  ratios  y(ito) 
of  interest  to  us  here  can  be  represented  by  an  expansion 
of  the  form* 

M„v  ,  . ,  .       ( ,       ai         b       ci       d  ) 

• 

We.  shall  confine  our  attention  to  the  ordinary  case,  in  which 
a  >  0,  b  <  0  and  2b  +  a2  <  0.     For  large  values  of  f col  ,  we  now 
have 

1/2 

(14)  A(u)   =  S[l  +  \  +  4  +  ...T2  +  C§  +  -%  +  ---l2! 

V  GO  U)  to  ' 


a  c 


(15)  B(u))  =  arctan  u) 


—  +  —75-  +  •  •  • 


,        b  d 
1  +  ~2  +  ~4  + 


It  is  clear  that,  for  I  oo|   sufficiently  large,  the 
leading  terms  of  these  expressions  will  furnish  adequate  approxi- 
mations to  A(u)  and  B(w).    These  are: 

2      9.  1/2 
(16)  A(w)   =  [1  +  a    +z  2b] 


(IV)  B(u))  =  §  . 

Let  uQ  be  the  frequency  defined  by  the  condition  that 

these  approximation  are  accurate  to  within  the  arbitrarily  chosen 
permissible  error  e  for  values  of  go  such  that  w>wq.    Then  we 
can  write 


*  In  the  usual  case  y(p)  is  a  rational  function,  so  that  this 
expansion  can  be  readily  obtained. 
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(  „co  .  r  _  ,   ,     .  n  n  doj 

Vn(t)  =  ±  Re  Jo°  A(co)n  eirnB(u))+ut^]  - 


O)  CO 

o 


=-±Re  (Ix  +  I2). 


It  is  clear  that 


II  I  <  fo  iam£  dw- 

1    ~J0        I  col  ■ 

Since  fA(w)  Jn  —  0  for  each  co  in  the  finite  range  0  <  to  <  u  , 

it  is  clear  that  1 I -J    can  be  made  negligibly  small  by  taking 

n  sufficiently  large.  Introducing  the  new  variable  v  defined 
by  the  relation 


v  =  CO 

J 


na 


■ 


I2  can  be  written  as 


r00 


1  + 


(a    +  2b )t 


nav 


V 


Letting 


(a     +  2b)t 
av2 
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and  using  the  binominal  expansion,  one  has 


Ca*  +  2b) t 

2 

nav 


n/2  — 


1  + 


n/2 


1  +  f  + 


|  (§  -  1) 


(41)' 


1  +  J  +  1/2  (1  -  ^)  (X)  + 
e^2  +  terms  in  l/n. 


Thus,  for  sufficiently  large  n,  I2  becomes,  approximately 


e 


2 


(a    +  2b)t 


2av 


e 


Vnat  (-  +  v)  dv 


In  this  form  the  principle  of  stationary  phase  can  be  applied  to 
I2  (Cf.  Appendix  I);  for  the  amplitude  factor 

(a2  +  2b)t 
2av2 


e 


v 


is  independent  of  n,.  while  the  phase  function  (in  the  notation 
of  the  appendix) 

¥(v)   «       +  v) 

is  monotonic  in  the  range  of  integration  on  each  side  of  the 
stationary  point  (v  =  1)  where 


tp'(v)  =  0 
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Physically  speaking  the  form  of  equation  (18)  suggest 
the  interpretation  of  Vn(t)  as  the  sum  of  an  infinite  number  of 

complex  waves  whose  amplitudes  are  slowly  varying  function  of  v 
and  whose  complex  phases  are  rapidly  varying  functions  of  v. 
Under  this  interpretation  it  is  physically  reasonable  to  expeot 
that  wave  interference  will  occur  everywhere  except  near  v  =  1 
where  the  phase  function  given  by  equation  (19)  is  stationary. 
This  is  the  principal  of  stationary  phase.    It  remains  to 
evaluate  the  principal  contribution  to  Ig  for  values  of  v  near  1. 

Replacing  y  (v)  by  the  first  three  terms  of  its  Taylor*s  series 
about  v  =  1, 

q>(v)   =  cp(l)  +  0  +  -  1)     =  2  ♦  (v  -l)2 


the  main  contribution  to  Ig  is  given  by 


r>l+Tl 


1    *  eir2vnat  -  |] 


1-n 


e      2av2         iVnat  (v  -  l)2  dv, 


e 


In  the  interval  (1  -  r\f  1  +  r\)  t  the  amplitude  factor 


i  exp  T(a2  +  2b)t/2av2] 

is  substantially  constant  and  may  be  removed  from  under  the 
integral  sign  and  evaluated  at  v  =  1.    By  the  reasoning  of 
Appendix  I,  the  contributions  to  the  remaining  integral  are 
not  appreciably  affected  if  the  limits  are  changed  to  (-co,  oo  ) 
respectively.  Letting 


I  *  v  -  1 
we  can  then  write  10  in  the  form 


I    ~  exp  j(a2  exp  fi  2v€St  -  1  §3 f°°  eiVMt  «  d£ 

(  )  -CD 
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By  the  known  properties  of  Fresnel  integrals 


—00 

and  hence 

Taking  the  real  part  and  dividing  by  n,  the  asymptotic  expression 
for  Vn(t)  is  therefore  given  by: 

(20)      Vn(t)  =  n'V2  (nat)-1^  exp  ( ( ag  +2b)t)  cos  {Z/m  _  n, 


which  is  equation  (8)  of  Part  I. 

A  more  accurate  approximation  to  the  gain  A(w)n  is 

given  by 

if,.i       n       2b  ♦  a2      2d  +  bf_j_2ac-.l/2 
A(w)  =  [1  +   *         +  t  J 

where  the  first  three  terms  of  equation  (13)  have  been  retained. 
From  this  it  follows  that: 

m.a*  ~  n  (/2b  +  a2  2d  +  b2  +  2ac? 
A(w)     =  exp  -J-  (  §         +   t  J 


exp  [n  (2b  .  a2)]  exp  j|  (2d+b2+2ac)| 
(*       ^      )  (2  ^  ) 


from  which  it  follows  that  the  second  approximation  is  obtained  by 
multiplying  the  first  by  the  factor 


exp  (p 

r 


jn  (2d  +  b2  +  2ac) 


If  the  frequency  transformation  v  = 


7? 


is  now  made 


the  first  factor  will  as  before  be  independent  of  n.    Over  the 
range  of  integration  where  the  integral  is  significant their 
product  can  be  removed  from  under  the  integral  sign  giving 


V  (t)  =  (n)"1/2  (nat)*"1/4  cos  (2Vnat  - 


exp 


(a2  *  2b)t 
  2a  _ 


exp 


(2d  +  b2  +  2ac)t2 
 P  

2a2  n 


%  (u)"1/Z  (nat)"1/4  cos  (2vnat  -  $) 


e 


(a    +  2b )t 
2a 


,       (2d  +  b2  +  2ac)t2 

1  +  J  5  1        *  ••• 

2,eT  n  _J 


which  is  the  equation  (9)  of  Part  I. 
Band  Pass  Case  -  Impulsive-  Input 

For  simplicity  let  it  be  assumed  that  the  gain  charac- 
teristic A(u)  has  only  one  absolute  maximum  at  u>  =  wQ  on  the 
positive  frequency  range  and  that  this  is  a  second  order  maximum. 
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The  response  Vn(t)  can  always  be  written  in  the  form 

(co  ) 

A(wo,n        f     n  log  H^-r      inB(u)  +  iut  ) 
Vn(t)  =  — Re  Jo  en  l0*  TU^f  ♦  dw). 

In  this  form,  Vn(t)  can  again  be  interpreted  as  being  proportional 
to  the  sum  of  an  infinite  number  of  complex  waves  of  amplitude 

with  varying  complex  phase*  given  by 

cp(w,t,n)  «=  nB(o))  +  wt. 

With  this  interpretation  it  is  clear  that  the  maximum  contri- 
bution to  Vn(t)^will  be  given  by  those  frequencies- in  the 

neighborhood  of  u>  ,  where  uQ  satisfies  Ar(w)  =  0  and  at  values 
of  the  time  t  near  t    at  which  the  phase  function,  <p(u>,t,n) 
is  stationary  for  the  maximum  frequency  i»Q.    Thus  tQ  is  given 
by 

t0  =  .nBM«0). 

Since 

A(w0)  ^  0  and  A«(wpj  =  0 


♦"Phase"  as  used  here  differs  from  the  way  it  is  normally  used 
in  engineering. 
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one  can  write  for  a  suitable  small  neighbothood  of  wQ 

If  we  retain  only  the  first  term  of  this  expansion,  then  for  a 
suitably  restricted  neighborhood  of  wQt  one  has 


n 


e 


n  log  A(uQ 
"TEC 


A(u>0) 


nA"(u>o)   (u  _  u  ,.: 


Similarly,  for  u  sufficiently  near  o)Q 


Bw(co0)  2 
(23)       B(o>)  =  B(coQ)  +  B»(w0)("  ~  «0)'*  — g         <w  "  V  * 

Henceforth  for  simplicity,  we  shall  write 

A  =  A(co0),  A"  =  A"(wo),  B  =  B(w0),  B»  =  B»(«0), 
B"  =  Bw(cjq) 


If  these  approximations  are  valid  in  the  neighborhood, 


(uQ  -  A,  wQ  +  A  it  follows  that 


vn(t) 


( 


iRe  ( 


f 


A(u>)n  e^nB(w)  +  Wt:d(, 


Wo+A_J 


♦  A 


u)Q+A 


u>o-A 


exp 


nAn 


(W  -  a)  )2  +  i[nB  +  nB»  (w  -  (DQ) 
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Since  [A(u>)]n  —  0  as  n  —  oo  ,  except  near  u  =  wq,  it  follows  as 

before  that  the  sum  of  the  bracketed  integrals  can  be  made 
negligibly  small  in  comparison  with  the  remaining  one  if  n  is 
taken  sufficiently  large.    Recalling  that 


t    =  -nB'CO 
o  o 


the  remaining  integral  can  be  written  as 


Tn(t)  =  |  Re  Un  e1^  ♦  -tl 


,u)o+A     r  „ 

exp  M11  (w  "^o1    +  i(t  -to)(a)  -°)o) 


inB" 


) 

dw) 
) 


Again  the  finite  limits  of  integration  can  be  replaced  by  -  go 
and  oo  since »  for  large  n, 


I*- (--.-„)' 


e 


will  be  small  except  in  the  immediate  neighborhood  of  u  . 
If  one  sets 


p  .  -n  (£  *  oB")  . 

p2  =  i2(w  -  wo)  ;  g  -  t  tQ 

then  the  remaining  integral  can  be  recognized  as  pair  No.  710.0 
of  the  Campbell  and  Foster  Tables. 


Then  one  finds 


Vn(t)  =  —372"    Re     {{  An    expCinB+io)0t3    exp  [-(t-tQ)2] 


2n°/&  ( 


(  VP 


4p 


The  result  is  equivalent  to  that  given  by  equation  (11) 
of  part  I.    If  A(cjQ)  is  greater  than  1,  it  is  thus  seen  that  the 

response  will  have  a  maximum  value  that  builds  up  very  rapidly 
as  n  increases  and  would  eventually  force  any  system  involving 
vacuum  tubes  to  overload. 

It  should  be  remarked  that  the  above  approximation 
to  the  gain  could  only  be  expected  to  be  a  reasonable  one  for 
fairly  large  values  of  n,  since  it  represents  a  usually  un- 
symmetric  gain  characteristic  by  a  symmetric  function.    A  better 
or  second  approximation  can  be  obtained  by  keeping  the  second 
term  of  the  expansion  of  the  logarithm  in  (21),  and  then  tak- 
ing the  first  term  of  the  expansion  of 


(U)  -  0)  )'  . 
e 


This  yields 


The  addition  of  the  second  term  in  the  above  ex- 
pression gives  rise  to  an  additional  term  in  Vn(t),  provided 
that  the  same  phase  approximation  (23)   is  retained.  The 
resulting  V  (t)  is  similar  to  (11)  but  the  new  envelope  con- 
sists of  the  old  envelope  plus  nA"/6A  times  the  third  deriva- 
tive of  the  old  envelope.    The  modulated  frequency  remains 
the  same  but  the  phase  is  changed  in  a  complicated  manner. 
(Compare- pair  710.3  of  the  Campbell  and  Foster  tables). 


Unit  Step  Input 

In  this  case  one  can  write 


Vn(t)  =  -  Re 


oo 


i[nB/u)  +  g] 


(I) 


As  before  the  only  significant  frequencies  are  in  the  neighbor- 
hood of  a)  =  to    and  near  this  point  the  1_  in  the  denominator 

can  be  taken  out  of  the  integral  as  l/w"  provided  u>Q  i  0.  Thus 

the  result  will  be  same  as  for  the  impulsive  input  apart  from 
the  factor  l/wQ  if  one  makes  nB(u>)  -  n/2  correspond  to  nB(u>) 

in  (11). 
Low-Pass  Case 

It  is  clear  that  the  analysis  for  this  case  in  which 
the  equation  A'(")  =  0  is  satisfied  for  w  =  0  can  be  carried 
through  in  exactly  the  same  manner  as  the  band-pass  case  treated 
previously.    The  resulting  answer  is  capable  of  simplification, 
however,  if  it  is  recalled  that  B(w)  for  any  physical  network 
is  an  odd  function  of         This  forces  both  B(0)  and  B,f(0)  to 
be  zero.    The  resulting  formulae  then  become 

a)     Impulsive  Input 


b)  Unit  Step  Input 
(24) 


A(0)n  e  W  A(0) 
  2n  A"(Cfr 


Tt 


3/2 


vn(t) 


A(o) 


n 


3/2  /2nA'  Ha) 
n      J  A(Gj 


,t 


(-(t-tQ)2A(o)) 
exp  j     2nA"(»)  jdt' 
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This  last  expression  involves  an  integral  since  it 
is  necessary  to  eliminate  the  pole  at  zero  where  A(w)  has  its 
maximum.    This  can  be  done  by  differentiating  Vn(t)  with  res- 
pect to  t,  finding  the  aysmptotic  formula  for  V^(t)  as  before 

and  then  integrating  to  obtain  (24) • 
Hamy*s  Expansions  in  the  Band-Pass  Case 

The  type  of  asymptotic  expansions  so  far  given  for 
the  band-pass  case  were  explicitly  designed  to  represent  Vn(t) 

in  the  neighborhood  of  t  =  t    where  Vn(t)  is  a  maximum.  They 

could  in  no  sense  be  considered  the  true  asymptotic  expansions 

for  values  t«  t    or-t»  t  .    In  particular  their  derivation 

o  o 

depended  upon  the  fact  that  the  'time  of  maximum  response  was 
related  to  the  number  of  four  terminal  networks  by  means  of 
the  equation 

t0=-nB'(wo), 

so  that  as  n  —  oo  ,  tQ  —  oo  . 

Other  types  of  expansion  are  clearly  possible. 
Two  obvious  alternatives  are: 

(1)  Those  valid  for  fixed  n  as  t  —  oo  ; 

(2)  Those  valid  for  fixed  t  as  n      co  . 

The  first  of  these  will  not  be  considered  here  since 
they  are  of  little  interest  as  all  of  the  four  terminal  networks  - 
have  been  assumed  to  be  absolutely  stable.    The  interested  reader 
is  referred  to  the  book  by  Doetsch  on  Laplace  Transformations 
for  expansions  of  this  type. 

Since  the  second  type  of  expansion  is  of  interest 
here  and  is  not  to  be  found  in  most  of  the  standard  reference 
works  it  will  be  discussed  here  briefly. 

In  a  classic  paper,  M.  Hamy*  derived  general  ex- 
pansions of  this  type  for  complex  integrals  of  the  form 

J  f(z)  <pn(z)dz 


♦journal  de  Mathematique,  vol.  4,  6th  series,  1908,  page  203. 


under  a  variety  of  hypotheses  on  f(z)  and  <p(  z)  .     These  condi- 
tions include  the  case  where  qr(z)  has  a  saddle  point  given 
by  the  solution  of  tp*(z)  =0  and  the  result  of  this  case  is  a 
generalization  of  the  often-used  theorem  of  Fowler  which  one 
finds  in  his  book  on  statistical  mechanics  under  the  title  of 
the  saddle  point  method. 

More  to  the  point,  they  also  include  the  case 
where  cp(z)  has  one  or  more  maxima  on  the  path  of  integration 
at  which  <p*(z)  =0  provided  that  f(z)  admits  a  Taylor  series 
expansion  about  these  points.    In  particular,  then,  if  one 
considers  t  as  a  fixed  parameter 'they  apply  to  the  integral 
of  equation  (1),  with  c  =  0  and  <p(  z)  =  y(p);  f(z)  =  ePtvQ(p). 

In  terms  of  our  notation,  one  finds  that: 

(a)  for  an  impulsive  input  with  gain  maxima  at  <*)  =  wQ 

2An(cO  x 
VtJ  ~  nB'(a>°)   COS  rV  +  n  B(u,o):i  +  term  in  ^  * 

(b)  for  a  unit  step  input  with  gain  maxima  at  w  =  uQ  f  0. 

2An(w  )  , 

Vn(t)  ?a    COS  [V  +  nB^o]^  +  termS  in  —  ' 

■  v  o'  o  n 

It  is  interesting  to  note  that  these  formula  indicate 
a  dependence  upon  1/n  instead  of  1/Vn  as  in  the  case  of  the 
previous  expansion.    These  formulae  can  be  thought  of  as  repre- 
senting the  response  in  the  band-pass  case  for  any  fixed  t, 
t«  tQ. 
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Appendix  I 

■ 

Certain  remarks  of  Aueral  Winter*  on  the  justification 
of  the  principle  of  stationary  phase  are  pertinent  enough  to 
the  above  discussion  to  bear  repetition  here.    In  order  for  the 
integral 


(25) 


f(x)  e^(x,dx 


to  be  asumptotically  represented  as  p  —  oo  ,  by  the  formula 
(Cf.  Lamb,  Hydrodynamics  p  395) 


(26)  a      ^J^ToT     .    e  irP9(a)±inJ 

.  y|pltp"(a)l 

where  cp'(a)  ■  0  and  where  the  upper  or  lower  sign  is  to  be 
taken  according  as  <p"(a)  is  positive  or  negative,  it  is 
evident  that  two  things  are  sufficient. 

(1)  The  contribution  to  the  integral  outside  a  small  interval 
around  the  stationary  value  a  of  <p(a)  must  decrease  more 
rapidly  as  a  function  of  p  than  the  one  obtained  in  the 
neighborhood  of  a; 

(2)  The  asymptotic  formula  given  above  must  adequately  re- 
present the  behavior  of  the    contribution  to  the  integral 
from  the  neighborhood  of.  the  stationary  value  a. 

Now,  if,  on  any  closed  interval  I,  <p*(x)  is  continuous 
and  has  no  zeros,  and  if  <p(x)  is  strictly  monotone  in  this  inter- 
val, then  z  =  <p(x)  can  be  introduced  as  a  variable  of  integration 
on  that  interval,  transforming  S  into 


*  Method  of  Stationary  Phase  Journal  of  Math.  &  Physics, 
vol  24,  no  3-4  -  1945 
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f(x)  e^(x)  dx 


f  [^(zJJ  eipz  dz 


If,  in  addition  to  the  above,  <p(x)  and  tpf,(x)  are  continuous 
and  if  f(x)  and'f'(x)  exist  and  are  continuous,  this  last 
integral  can  be  integrated  by  parts,  giving 


S  = 


|  fr^uneip2j 

Ip 


{ 


) 


1 

ip 


e±PZ  A  fCT_i(z)]dz 


-1, 


and  showing  that  on  any  such  interval  I, 

S=0(I). 


Thus,  condition  (1)  will  be  satisfied  if,  in  the 
neighborhood  of  the  stationary 
the  integral  is  greater  than 


point 

o(I). 


a,  the  contribution  to 


This  is  clearly  the  case  when  the  asymptotic  formula 
(26)  is  valid,  since  there  the  dependences  on  p  is  as  1/vp. 
it  can  be  shown  that  (26)  is  valid  whenever 


-1 


tp(ct)  =  0,  <ptf(a)  f  0  and  <p«  •  (x)  and  f|> 

are  of  bounded  variation  in  the  neighborhood  of  the  stationary 
value.    Thus,  to  recapitulate,  under  these  conditions,  the 
maximum  contribution  comes  from  the  stationary  point  and  depends 
on  p  as  l/vpt  while  the  points  which  are  not  near  the  stationary 
point  contribute  terms  depending  upon  p  only  as  l/p , 

To  conclude  this  brief  appendix,  it  should  be  remarked 
that  Winter  gives  an  extension  of  (10)  which  is  valid  under 
the  same  condition  of  f[tp~l(z)]  if  the  first  n  derivatives  of 
<p(x)  vanish  at  some  point  a  while  cpn+1(x)  does  not.    These  results 
could  be  used  to  extend  the  treatment  of  the  high-pass  case 
given  above  to  the  cases  in  whion  a2  +  2b  =  0,  etc. 
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Electronic  Methods  in  Telephone  Switching 

C.  E.  Shannon 

In  the  recent  development  of  electronic  digital  computing  machines  various  new 
tubes  and  other  electronic  devices  have  been  designed  which  may  be  of  use  in 
machine  switching.  In  particular  the  "selectron"  tube  developed  by  R.  C.  A.  and  the 
mercury  acoustic  delay  tank  provide  large  cheap  memory  devices  in  which  information 
can  be  registered  or  read  off  in  electronic  time  intervals  (of  the  order  of 
microseconds).  Since  one  of  the  chief  functions  of  the  relays  and  switches  in  a 
telephone  exchange  is  that  of  memory  (e.g.  the  relays  remember  which  calling  and 
called  lines  should  be  connected  together)  it  is  worth  while  considering  the  possibility 
of  using  such  tubes  to  replace  ordinary  electro-mechanical  switching  equipment. 

Suppose  we  have  an  exchange  (or  set  of  exchanges)  serving  n  subscribers  and  that 
the  exchange  can  handle  a  peak  load  of  m  simultaneous  conversations.  These  may  be 
between  any  m  pairs  of  the  subscribers.  Thus  the  exchange  must  be  capable  of 
assuming  as  many  different  states  as  there  are  of  selecting  m  pairs  of  objects  from  n . 
This  can  be  done  in 

n\ 

ml  2m(n  -  2m)! 

different  ways.  For  n  and  m  large  the  logarithm  of  this  is  approximately  2m  log  n . 
If  the  logarithm  is  to  the  base  ten  then  this  is  the  required  memory  capacity  of  the 
exchange  measured  in  decimal  digits.   If  the  logarithmic  base  is  two  the  units  are 


binary  digits.  A  single  two-position  relay  has  a  capacity  of  log  2  units  (one  binary 
digit  or  .30103  decimal  digits),  while  5  relays  have  S  log  2  units.  A  10  x  10  crossbar 
switch  has  a  capacity  of  10  log  10,  while  a  single  commutator  on  a  panel  has  capacity 
log  r ,  where  r  is  the  number  of  vertical  positions  of  the  brushes.  Hence  the  number 
of  relays  required  for  a  pure  relay  exchange  would  be 

2m  log  n 
log  2  ' 

the  number  of  10  x  10  crossbars  would  be 

2m  log  n 
10  log  10  ' 

etc.  To  these  estimates  must  be  added  the  losses  due  to  inefficient  use  of  the  memory 
and  also  the  memory  of  equipment  used  for  functions  other  than  merely  remembering 
which  connections  are  being  held  at  a  given  time. 

An  ordinary  relay  is  capable  of  remembering  (by  a  holding  circuit)  one  binary 
digit.  A  pair  of  vacuum  tubes  in  a  flip-flop  circuit  has  the  same  memory  capacity. 
The  cost  of  these  is  of  comparable  magnitude,  and  thus  if  one  designed  an  electronic 
telephone  exchange  by  merely  changing  relays  to  equivalent  vacuum  tube  circuits  the 
chief  advantage  of  the  electronic  circuit  would  be  one  of  speed,  an  improvement  of 
order  103.  In  many  cases  this  could  produce  a  reduction  of  cost  since  frequently  many 
identical  units  of  a  certain  type  must  be  supplied  because  the  individual  units  are  slow. 
This  is  apt  to  be  the  case  with  units  which  are  associated  with  the  beginning  or  end  of 
calls  but  need  not  be  used  during  the  conversation.  On  the  other  hand  equipment  to 
be  used  throughout  the  call  would  offer  less  advantage  under  this  tube  for  relay 
replacement  since  the  expected  duration  of  calls  is  long  compared  to  electronic  times. 


The  newer  electronic  memory  devices,  however,  change  this  picture  considerably. 
A  selectron  tube  (when  these  tubes  are  in  production)  may  be  expected  to  cost  $100  or 
less  depending  on  the  demand.  It  is  capable  of  holding  4096  binary  digits,  giving  a 
cost  per  binary  digit  of  the  order  of  2.5  cents,  while  the  cost  of  the  equivalent  relay 
may  be  of  the  order  of  2.5  dollars.  Mercury  delay  lines  can  store  information  at  a 
comparable  cost.  Thus  it  is  not  impossible  that  a  reduction  of  the  order  100  to  1  in 
switching  equipment  cost  might  be  possible  by  the  use  of  electronic  devices,  even  in 
the  parts  where  information  must  be  stored  for  long  periods  of  time. 

An  indication  of  how  such  tubes  may  be  used  is  given  in  the  attached  figure. 
Fig.  1  is  a  block  diagram  of  a  simplified  exchange.  The  calling  parties  are  connected 
to  an  electronic  commutator  which  samples  the  speech  signals  periodically  and  puts 
the  various  lines  in  the  time  division  multiplex.  The  called  parties  are  also  connected 
in  time  division  multiplex  to  a  single  channel  by  means  of  an  electronic  commutator 
or  distributor.  The  function  of  the  middle  part  is  to  rearrange  the  samples  in  such  a 
way  as  to  provide  any  desired  interconnection  between  calling  and  called  parties.  This 
is  done  by  dividing  the  sampling  period  into  two  equal  parts.  During  the  first  half  the 
signal  plate  of  the  upper  selectron  is  connected  by  gate  1  into  the  calling  line 
multiplex  channel.  Its  windows  are  caused  to  open  in  sequence.  Thus  at  the  end  of 
the  first  half-cycle  the  first  samples  of  all  the  incoming  channels  have  been  written  on 
the  face  of  the  tube  in  their  regular  order.  During  the  second  half-cycle  gates  1  and  3 
are  closed  and  gates  2  and  4  are  opened.  Thus  the  output  of  the  selectron  is  fed  into 
the  called  line  multiplex  and  the  windows  of  the  selectron  are  controlled  by  the  other 
selectron  tube  2.  This  tube  has  registered  in  a  suitable  notation  the  numbers  of  the 


called  line  desired  by  the  calling  line.  The  windows  of  this  tube  are  opened 
sequentially  by  the  cycling  unit  and  the  numbers  registered  there  control  the  windows 
on  tube  1  allowing  the  sample  from  calling  channel  1  to  go  into  the  proper  place  in 
the  called  line  TDM. 

By  a  more  elaborate  system  it  is  possible  to  make  use  of  the  fact  that  only  a  small 
fraction  of  the  lines  will  be  busy  at  a  given  time,  as  is  done  in  ordinary  relay 
switching.  This  can  be  achieved  by  only  supplying  enough  places  in  the  distributors 
for  the  peak  load.  When  a  call  originates  the  calling  and  called  parties  are  assigned 
idle  spaces  in  the  distributor.  The  place  assigned  to  the  called  party  is  registered  in 
the  selectron  register  corresponding  to  the  place  assigned  to  the  calling  party. 


Some  Generalizations  of  the  Sampling  Theorem 

We  have  seen  that  a  function  of  time  f(t)  containing 
no  frequencies  over  W  cycles  per  second  can  be  described  by- 
giving  its  value  at  Nyquist  intervals  (spaced  ^  seconds  apart). 
It  can  be  reconstructed  from  these  samples  using  the  basic 
functions  sin  2nWt/2nWt ,  together  with  the  same  function  shifted 
by  integer  numbers  of  Nyquist  intervals.    We  now  consider  some 
generalizations  of  this  result. 

In  the  first  place  the  particular  function 
sin  2nWt/2nWt  is  by  no  means  necessary  for  the  reconstruction. 
In  fact  any  function  cp(t)  which  contains  all  frequencies  up  to 
W  is  satisfactory.    More  precisely  the  spectrum  of  cp(t)  should 
not  vanish  over  any  finite  set  of  frequencies  (set  of  positive 
measure)  up  to  W.    If  <p(t)  satisfies  this  condition  the  original 
function  f  (t)  can  be  reconstructed  using  cp(t)  and  its  shifted 
images  <p(t  +  ~) .    That  is  coefficients  a£  can  be  found  such 
that 

°°  K 
f (t)  =     2      aK  q>(t  +  f»)  . 
j[ — _  00    *»•  *w 

In  general  the  coefficients  are  not  found  as  easily  as  in  the 
special  case  where  cp(t)  =  sin  2nWt/2nWt  (when  they  are  merely 
the  values  of  f (t)  at  the  Nyquist  points)  but  they  may  be 
calculated  as  follows.    Let  F(w)  be  the  spectrum  of  f (t)  and 
$((0)  be  the  spectrum  of  cp(t).    Expand  the  function  F((d)/$(co)  in 
a  Fourier  series  using  -W  to  4W  as  the  fundamental  interval. 
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Thus 

.ko) 

F(cj)  _  T  _     _  2W 
ft(u)  ~  L  SK  6 

°r  £& 

F(w)  =  Z  aK  0>(oj)  e  2W  . 

Taking  the  transform  of  the  equation  we  obtain  the  desired 
expansion 

f(t)  =  2  aK  cp(t  +  !y)  . 

The  coefficients  in  the  expansion  can  therefore  be  determined  as 
the  coefficient  of  a  Fourier  series  expansion  of  F(w)/<I>(<d)  .  In 
general  the  function  cp(t  +  ^)  will  not  form  an  orthogonal  set 
and  therefore  the  energy  in  f(t)  cannot  be  found  from  2  aK  as  it 
was  in  the  simple  case  where  «p(t)  =  sin  2nWt/2nWt. 

A  physical  method  of  performing  this  expansion  can 
also  be  given.    Consider  a  filter  which  gives  the  output 
sin  2nWt/2nWt  when  the  input  is  <p(t) .    If  the  function  f(t)  is 
passed  through  this  filter  the  amplitudes  of  the  output  at 
Nyquist  intervals  will  be  the  desired  coefficients.    This  is 
true  since  this  output  can  be  considered  as  expanded  in  the 
f mictions  sin  2TrWt/2rrWt  with  the  amplitudes  as  coefficients, 
and  the  inverse  filter  would  restore  the  original  function  and 
change  each  of  these  functions  with  cp(t)  at  the  corresponding 
Nyquist  point. 

A  function  f (t)  can  also  be  determined  from  a  knowledge 
of  its  value  and  derivative  at  alternate  Nyquist  points: 


We  have  here  the  same  number  of  measurements  per  second,  2W, 
but  half  of  these  are  ordinates  of  f(t)  and  half  are  derivatives. 
The  reconstruction  of  f(t)  from  these  values  can  be  carried  out 
simply  using  two  basic  functions: 

_  (  +  x  _  sin2  nWt 

Tllt)  '"wmT 

m      x    .  sin2  rrWt 
*2{t)  ~      (nWt)  * 

Both  of  these  lie  entirely  within  the  band  W  and       has  the 
property  that  it  and  its  first  derivative  vanish  at  alternate 
Nyquist  points  (except  for  t  =0  where  the  function  is  1  and 
its  first  derivative  0) .    Likewise  cp2  and  cp£  vanish  at  alternate 
Nyquist  points  except  at  t  =  0  where  cp2  =  0  and  (p2  =  1.  Thus 
we  can  fit  the  ordinates  of  the  original  function  f (t)  using  ^ 
and  its  shifted  images  (shifted  by  two  Nyquist  intervals).  The 
derivaties  of  f(t)  are  fitted  using  cp2  and  its  shifted  images. 
Due  to  the  vanishing  of  these  functions  none  of  the  fittings 
interfere.    The  function  constructed  by  this  process  must  lie 
within  the  band  and  have  the  same  values  and  derivatives  as  the 
original  function  f (t)  at  alternate  Nyquist  points.    That  there 
is  only  one  such  function  can  be  shown  by  arguments  similar  to 
those  used  in  the  basic  sampling  theorem,  generalized  by  break- 
ing down  the  spectrum  into  an  even  and  an  odd  part. 
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It  is  possible  to  carry  this  further  and  determine  a 
function  from  knowledge  of  its  value  and  first  (n  -  1) 
derivative  at  points  separated  n  Nyquist  intervals  apart.  In 
this  case  the  basic  functions  are 

sin11  (Sgfc) 


*1  = 


n 

(2nWtxn 
1  n  ' 


_  sinn  (agt) 

1   n  ' 


s.nn  (2^t} 
n-2 


/2nWt% 
K~ n~"; 


rn  2nWt 
n 


These  functions  possess  the  properties: 

1.  They  lie  within  the  band  W. 

2.  They  vanish  at  t  =  |g     K  =  ±  1,  ±  2,  ... , 
(that  is  at  n-th  Nyquist  points)  and  also  their 
1st,  2nd,  (n-1)  derivatives. 


3.    At  t  =  0,  all  derivatives  of  cp_  vanish  except  the  s-th 

s 

derivative  which  is  1. 


Consequently  we  can  reconstruct  f(t)  by  using  <pg  to 
adjust  the  s  derivatives  (s  =  0,  1,  n-1)  and  these  adjust- 

ments will  not  interfere. 

The  functions  q;    and  their  spectra  are  shown  in  Fig.  1 
s 

for  the  cases  n  =  1,  2,  3* 
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The  Normal  Ergodic  Ensembles  of  Functions 

Among  the  possible  probability  distributions  in  a  one- 
dimensional  space  certain  ones  are  of  special  importance  because 
of  their  simple  mathematical  properties  and  frequent  occurrence 
in  the  physical  world.    The  most  important  of  these  is  the 
normal  or  Gaussian  distribution  with  a  density  function: 

1/J2R  a  exp  £  |  x2/<^ 

In  an  n-dimensional  space  the  most  important  distribution  func- 
tion is  an  n-dimensional  generalization  of  this,  the  n- 
dimensional  normal  distribution: 

i       5       r  -  -i 
^IV<a»r  e*P  ai;j  xi  xj 

Here  a^  is  the  associated  quadratic  form  and  the 
determinant  of  this  form.    This  form  is  positive  definite  and 
the  surfaces  of  the  constant  probability  are  found  by  setting 
the  argument  of  the  exponential  function  equal  to  a  constant 

2  H .  x±  Xj  =  C 

and  are  therefore  coaxial  elipsoids  in  the  space.    The  direc- 
tions of  the  axes  of  this  elipsoid  are  those  of  the  eigen- 
vectors of  the  form  a^  and  the  lengths  are  inversely  proportional 
to  the  corresponding  eigenvalues.    By  a  rotation  of  axes  the  new 
coordinate  system  can  be  lined  up  with  these  directions  and  the 
distribution  function  reduced  to 
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n 

{X1»         #oe»  V     (2n)      exp  -  |  Z  5^  y* 

where  the  \±  are  the  (positive)  eigenvalues  and  the  y^^  are  the 
new  coordinates.  The  form  a^j  being  positive  definite  has  an 
inverse  A^j  which  is  also  positive  definite  with  eigenvalues 

The  properties  of  the  n-dimension  normal  distribution 
which  give  it  particular  mathematical  importance  are  the 
following. 

1.  If  x±  and  y±  are  two  chance  vector  variables,  which 
are  independent  and  distributed  according  to  n-dimensional 
normal  distributions  with  quadratic  forms  a^  and  b^.  (inverses 
A^j  and  B^) ,  then  the  chance  vector  variable       =  x±  +  Ji  is 
also  distributed  normally  with  the  form  c^y  whose  inverse  is 

Cij  =  fij  +  Bij° 

2.  If  x    is  a  normally  distributed  vector  variable  and 

yj  =  2  r^j  x^  is  a  vector  variable  which  is  a  linear  operation 
on       (possibly  of  smaller  dimension  thann)  then  yj  is  normally 
distributed  with  the  inverse  form 

=   Z    r,    r^  Ast  • 
ij      s,t    is  jt 

,3.    Under  certain  quite  broad  conditions  the  resultant  of 
a  large  number  of  small  chance  vector  variables,  x®  (s  =  1,  2,  N) 
with  arbitrary  distribution  functions,  which  are  independent 
gives  a  normal  distribution  for 
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with 

providing  no  term  of  the  sum  contributes  more  than  a  small 
fraction  to  any  B. 

4,  If  the  a  priori  probabilities  for  each  of  two 
independent  vectors  xi  and  y±  are  both  normal,  the  a  posteriori 
probability  of  x^  when  we  know  the  sum  x±  +  7^  —  ^  is 
normally  distributed  (about  a  displaced  mean,  however). 

5.  The  mean  value  of  x±  x^  for  x±  normal  is  given  by 

xi  xj  =  Aij  * 

Among  the  many  possible  ergodic  ensembles  of  functions 
fa(t)  there  is  also  a  certain  class  of  particular  mathematical 
and  physical  importance.    This  class  of  ensembles  can  be  con- 
sidered a  generalization  of  the  n-dimensional  normal  distribution 
to  infinite  dimensional  function  spaces  ergodic  under  trans- 
lations in  time.    We  shall  call  these  normal  ergodic  ensembles 
of  functions.    They  are  completely  specified  by  giving  their 
power  spectra  P(w)  or  their  autocorrelation  functions  A(t) 
which  are  the  Fourier  transforms  of  the  power  spectra.  The 
normal  ergodic  ensembles  can  be  defined  in  various  ways.  They 
occur  physically  when  we  pass  a  thermal  noise  through  a  filter, 
shaping  the  power  spectrum  to  P(w)  =  |l(w)|2,  T(«)  being  the 
admittance  of  the  filter. 


In  the  literature  on  noise  these  ensembles  are  often 
treated  in  a  loose  somewhat  illogical  fashion  by  using  either 
of  two  "representations."    The  first  representation  is 

oo 

2    |P(nAf)Af  cos  (nAft  +  6  )  . 
n=0 

The  6n  are  all  uniformly  and  independently  distributed  over  all 
values  from  0  to  2n.    This  representation  amounts  to  making  the 
noise  the  sum  of  a  large  number  of  small  sinusoidal  waves  with 
random  phases,  and  amplitudes  adjusted  to  give  the  proper  power 
density  in  any  small  frequency  range.    The  frequency  increment 
between  adjacent  waves  Af  is  supposedly  very  small  and  in  use 
one  evaluates  any  desired  statistic  of  this  set  of  functions  and 
determines  the  limit  approached  by  this  statistic  as  Af  -  0. 
This  limit  is  taken  to  be  the  desired  statistic  of  the  normal 
ergodic  ensemble.    The  second  representation  is  similar  but  uses 
normally  distributed  amplitudes  an  whose  variance  cr    is  equal 
to  P(«) 

2  aBAf  cos  (nAft  +  6J  . 

Actually  these  "representations"  will  not  give  the 
correct  answer  in  all  cases.    For  example,  if  we  ask  what 
fraction  of  the  functions  in  the  representation  ensemble  r^ 
are  periodic,  we  find  that  all  are,  so  the  probability  is  unity, 
and  the  limit  as  Af     0  is  also  therefore  unity,  while  almost 
none  of  the  functions  in  the  ergodic  normal  ensemble  are  periodic 
However  it  can  be  shown  that  if  we  restrict  ourselves  to  what  we 


have  called  physical  statistics,  the  answer  will  be  identical; 
the  normal  ergodic  ensemble  is  the  physical  limit  of  either  of 
the  above  ensembles  as  Af  -*  0, 

A  more  logical  definition  of  a  normal  ergodic  ensemble 
can  be  given  as  follows.    We  divide  the  frequency  range  up  into 
unit  intervals  and  construct  the  sequence  of  "flat"  ensembles 
for  these  intervals.    These  will  be  given  by 

2  a„  sin  nt  • 
n 

These  ensembles  are  passed  through  shaping  filters  to  give  the 
proper  power  spectrum  in  the  interval  in  question  and  the  results 
added. 

The  normal  ergodic  ensembles  have  properties  analogous 
to  the  n-dimensional  normal  distributions  which  we  have  given. 
We  have 

Theorem:    The  sum  of  two  functions  fQ(t)  +  gp(t)  where  f  and  g 
are  from  normal  ergodic  ensembles  with  spectra 
and  P2  is  normal  ergodic  with  spectrum  P1  +  P2. 

Theorem:    The  output  of  any  linear  invariant  transducer  driven 
by  a  normal  ergodic  ensemble  is  normal  ergodic  with 
spectrum  |Y(«)|  P(w). 

Theorem:    Any  finite  dimensional  linear  operation  on  a  normal 
ergodic  ensemble  gives  a  normally  distributed  vector. 
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Systems  Which  Approach  the  Ideal  as  g  —  00 

We  will  show  that  it  is  possible  to  construct  an 

p 

instantaneous  system  for  sufficiently  large  -  for  transmitting 
a  sequence  of  binary  digits  such  that  the  frequency  of  errors 
is  arbitrarily  small  and  the  power  required  only  slightly 
greater  in  db  than  the  ideal  for  the  corrected  rate  of  trans- 
mission.   More  precisely  we  have  the 

Theorem:    Given  any  e>0  and  8  >  0  we  can  transmit  binary  digits 
on  an  instantaneous  basis  with  frequency  of  errors 
<  e  and  corrected  rate  of  transmission 

R  >  W  log  -jl  +  (1  -  5)  |  J 

The  system  to  be  used  is  of  PCM  type  with  an  extremely  large 
number  of  amplitude  levels.    Let  there  be  2s  levels,  and  number 
them  with  a  binary  notation,  but  in  the  Stibitz  type  code,  so 
that  only  one  binary  digit  changes  on  going  to  an  adjacent 
level.    If  we  are  in  error  by  d  levels,  at  most  d  binary  digits 
of  the  s  will  be  incorrect.    If  there  are  many  levels  in  the  a 
distance  U/I)  of  the  noise  the  expected  number  of  errors  will 
be  approximately 

2 


•p 

We  take  £  large  enough  so  that  es  >  a. 


Thus  the  frequence  of 


errors  in  our  final  result  will  be  <  e.    The  levels  should  not 
be  spaced  uniformly  but  according  to  the  density  of  a  normal 
distribution.    If  this  is  done  the  received  signal  will  be 
nearly  Gaussian  with  a  —  J?  +  N  and  the  corrected  rate  of 
transmission 


H  >  W  log    1  +  (1  -  5)  | 
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Theorems  on  Statistical  Socuencea 

If  It  la  poaalbla  to  go  froa  any  state  with  P  >  0 
to  any  other  alone  a  path  of  probability  p  >  0,  tha  system  la 
argodlo  and  tha  atrong  law  of  large  nuabera  can  be  applied. 
Thus  the  number  of  tines  a  given  path  p^j  in  the  network  la 
traversed  in  a  long  sequence  of  length  K  is  about  proportional 
to  the  probability  of  being  at  i  and  then  chosaing  this  path, 
P.p. 4K.    If  N  is  larne  enough  the  probability  of  percentage 
error  i  6  In  thia  la  less  than  c  so  that  for  all  but  a  aet  of 
email  probability  the  actual  numbers  lie  within  the  limits 

Hence  the  probability  that  nearly  all  sequences  lie  within 
limits  ±  ft  is  given  by 

and  lfijLJfc    lB  limited  by 

•  I(PlPiJ  ±  |)log  PiJ 

or 

|  ^  -  *  PiPij  log  Pijj  <  * 
Thus  we  have I 

Theorem    For  almost  all  sequences 


2 


Um  '  to*-*    •  H  •  -  i  PiPij  log  Pjj 


where  p  is  the  probability  of  the  sequence  baring  the  block 
of  length  L  starting  at  the  first  position. 

Thus  for  all  but  a  set  of  blocks  of  probability  <  « 
and  for  B  large  enough 

(H  -  $)«<-  log  p  <  (H  ♦  n)H 
*.p(H  -  q)H.  <  —  p  log  p    <      P(H  ♦  n)M 
where  «e  hare  aummed  orer  all  but  the  set  of  small  probability 
i.  p(H  ♦  a.)I   £   (I  ♦  sJM  *  P  S  W  *  *>* 

and   *  p(H  -  q)*     (H  -  q)I  *  P     U  -  q>  ■  U  -  •> 
For  the  sot  of  oaall  probability 


•I  p  log  p 


^  log  ^ 


since  this  is  maximised  f or  ip  •  t  by  making  all  p  equal,  and 
the  number  of  them  1  -Jj  •    But  this  is  dominated  by 


•  l  P  log  p|   £    |«W  lo«  | 


1  •» 


with  «  as  snail  as  d« sired  for  sufficiently  large  K  and  small  c. 
Henee  this  does  not  affect  the  sua  ia  the  limit  as  I  -*  oo  and 
we  have  the 

Theorems 

Lia  £  I  p  (Bt)  log  p(BL)  -  H 
I  -  oo 

where  plB^  is  ths  probability  of  block  B^  of  length  L,  and 
the  sua  is  ovsr  all  possible  blocks. 

We  now  prove  the 

Theorem      H  •  -  i.  p(BijSj)  log  PB^8!* 

«  Lie   -*  q(BtSj)  log  qB  (3^) 
UBHoe 

where  p(Blt8j)  is  the  probability  of  block  Bi  followed  by  8^  and 
PB^Sj)  is  the  conditional  probability  of  8j  after  the  block  Bt 
ia  known  to  occur.    q(Blt8j)  in  the  probability  when  B^  ia 
computed  on  the  basis  of  any  initial  state  probabilities,  not 
necessarily  the  proper  ones  and  q^Sj)  the  corresponding  condi- 
tional probabilities. 

The  first  equality  is  trus  since  we  may  summ  first  on 
all  B±  leading  to  a  given  state  K.    *he  terms  q,B^CS ^)  are  then 
all  equal  to  Pjj  and  the  terse  qlB^j)  sum  to  PKPjj  gives  the 
desired  result. 


If  the  q»s  are  used,  the  q^lSj^  are  still  p^  where 
I  It  the  stat*  In  which  B±  ends. 


*      qU-.S.)    •    pkj      i.  P(B1) 


since  any  Initial  distribution  tends  toward  equilibrium. 

We  hare  shown  that  apart  from  a  set  of  small  probability, 
the  probabilities  of  blocks  of  length  L  lie  within  the  limits 

-(H  -  S)M  .(H  ♦  S)M 

*  <  S>  <  2 

where  S  can  be  made  small  by  taking  B  large  enough.  Let  the 
maximum  number  of  blocks  of  length  M  when  we  delete  a  set  of 
measure  •  be  Qg(«).  Thent 


I        p  -  (1  -  t) 
remaining 
set 


Q  (I)  p       -  Q  (M)  2*lH  *  *)M 
t         max  c 


log  0tl«)  >  (H  ♦  6)M  ♦  log(l  -  t) 


Hence 


log  0  (li) 
Lim   S         -    %U)  £  8 

I  -CO  II 


Similarly 


1  >  I  p  >  GC(K)  pj^B 


frota  which  we  obtain 


log  0 


and 


•U)  *  H 


Hence  we  hare 

Theoremi     vU)  -  »      'or     t  J1  0,  1 

Tha  fact  that  for  large  M  nearly  all  blocks  hare  a 
probability  limited  by 


ri°JLE  ♦  s 


<  * 


does  not  imply  that  those  probabilities  approach  equality. 
In  fact  they  will  generally  diverge  from  one  another  but  the 
db  range  becomes  small  compared  to  K,  eince  for  p's  satisfying 
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this  inequality 

*»«  Pmax     lQg  Pmln  m  log  _ 
I  II  1 

It  it  possible  to  show,  however,  that  thert  exists  among  the 
blocks  of  length  It  a  subset,  all  of  equal  probability  which 
hare  the  sane  growth  with  K  as  the  set  including  all  blocks 
except  those  of  small  probability  totaling  less  than  t:  namely , 
the  subset  will  contain  more  than  2*H  "  ^N  eleoents  with  5 
arbitrarily  small. 

Consider  all  blocks  beginning  in  a  given  state,  say 

state  1,  and  ending  in  this  state.    Let  these  blocks  B1  

fig*...  have  lengths  n^,  n2,....,  t^,  ....  and  conditional 

probabilities  p^,  p2,          pat  .....  when  we  start  from  state  1. 

We  first  prove 

-1 

Theorem:  I  p^n^  •  p^ 

The  first  part  is  true  since  the  ergodic  character  of  the  system 
makes  the  Inverse  frequency  of  occurrence  of  state  1,  equal 
to  the  mean  distance  between  its  occurrences,  I  Pi*i«  The 
second  part  is  true  since  almost  all  blocks  of  large  length  N 
have  approximated  the  proper  frequency  of  each  B^. 

Now  we  return  to  the  construction  of  a  subset  of  growth 

(H  .  6)1 

2  all  of  equal  probability*    Let  us  choose  integers 


ai    at  close  as  possible  to 


and  construct  sequences  with  of  the  block  B±  .  The  number 
of  block*  is  then 

and  the  number  of  sequences: 

»  <-  Pt  log  pt 

The  growth  Is  then  in  term*  of  symbols 

lag*  . ,  *  4*  . 

This  proves  the  following! 


Theorems  Given  I  >  0  there  exists  a  set  of  M  blocks  of  length  X 
(when  H  is  sufficiently  large)  such  that 

AS  -  ft)S 


k>  a 


and  each  block  has  the  same  probability,  and  starts  and  ends  in 
the  eeme  state,  which  can  be  chosen  arbitrarily* 

In  case  the  system  is  not  ergodle  but  made  up  of  a 
finite  number  of  ergodle  systems: 

r  -  X  ctrt 

each  rt  will  hare  a  rate  Hi  which  we  may  assume  arrengee  in  a 
now  increasing  sequence 


The  function  %{•)  then  bieoMi  a  decreasing  atep  function  in  the 
manner  Indicated  by  the  following I 

Theorem!     In  the  case  conaidered 

K-l 

?(c)  •       in  the  internal    la^     <i<   j  ^ 

For  if  c  it  in  the  range  indicated  we  oust  take  a  set 
of  poaitiTe  probabilities  froa  at  least  one  of  r1#  ...»  rj. 
This  gives  a  growth  of  type 

at  least,  and  can  be  limited  to  this  by  choosing  all  sequences 
The  quantity 

will  be  called  the  man  statistical  rata  for  the  system. 

C.  E.  SHAM UGH 

April  26,  194* 


Samples  of  Statistical  English 
C  B  S^a**o* 

A  number  of  samples  of  statistical  English  including 
probability  structure  out  to  four,  words  are  given  below.  These 
were  constructed  by  starting  off  with  three  words  from  a  book. 
These  three  words  are  shown  to  someone  who  fits  them  in  a 
reasonable  English  sentence  and  writes  down  the  word  following 
the  three.    The  first  word  is  then  covered  up  and  the  process 
repeated  with  a  different  person,  etc.    If  the  imagined  sentence 
ends  after  the  added  word,  the  person  writing  the  word  adds  a 
period.    For  samples  bearing  a  title  the  participants  were  told 
that  this  was  the  subject  dealt  with.    These  samples  may  be 
compared  with  those  in  "A  Mathematical  Theory  of  Communication" 
where  less  statistical  structure  is  included. 

The  samples  given  here  were  obtained    for  the  most 
part,  with  the  aid  of  J.  R.  Pierce,  B.  McMillan,  C.  C.  Cutler 
and  W.  E.  Mathews,    A  few  of  the  samples  were  obtained  from 
other  sources  (contemporary  literature,  etc.)  and  are  included 
for  comparison.    The  reader  may  try  his  skill  at  guessing  which 
are  statistically  constructed.    The  true  sources  are  given  at 
the  end. 

1.  This  was  the  first.    The  second  time  it  happened  without 
his  approval.    Nevertheless  it  cannot  be  done.    It  could 
hardly  have  been  the  only  living  veteran  of  the  foreign 
power  had  stated  that  never  more  could  happen.  Conse- 
quently people  seldom  try  it. 

2.  John  now  disported  a  fine  new  hat.    I  paid  plenty  for  the 
food.    When  cooked  asparagus  has  a  delicious  flavor  sug- 
gesting apples.    If  anyone  wants  my  wife  or  any  other 
physicist  would  not  believe  my  own  eyes.    I  would  believe 
my  own  word. 

3.  That  was  a  relief  whenever  you  be  let  your  mind  go  free 
who  knows  if  that  pork  chop  I  took  with  my  cup  of  tea 
after  was  quite  good  with  the  heat  I  couldn*t  smell  any- 
thing off  it  ITm  sure  that  queer  looking  man  in  the 

4.  In  a  few  days  was  the  minimum  amount  of  money  remaining  to 
the  end.    However  everyone  knows  the  meaning  implied.  It 
was  true  when  Cutler  says  that  we  should  proceed  care- 
fully.   When  you  love  yourself  too  much.,  The  woman  who 
accosted 

5.  Fourscore  and  twenty  years  passed  before  we  could  meet  them 
that  isn't  already  done  should  have  been  a  good  son  is 
going  fast  according  to  the  teacher  of  his  ability.  His 
intelligence  sufficed  for  the  time.    This  cannot  change 
much. 
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6.  Even  the  killing  was  atrociously  perpretated  by  the 
cruelest  treatment  that  a  small  boy  jumped  over  the  hedge 
and  buried  her.    A  grave  fault  of  many  approaches  to  the 
furthermost  reaches  of  the  state.    Politics  and  business 
are  becoming  lost  to  the . 

7.  It  is  an  Italian  ox  mouth  dish.    The  only  thing  in  the 
room  is  worms.    I  am  the  director  of  the  seminar.    In  an 
evolving  hemisphere.    C'est  Monsieur  Jardin.    I  am  a 
patient.    Oh  my  dear  Plapsen,  you  are  my  dearest  Klapsen. 

He  took  it  with  many  other  matters  are  more  apparent  if 
they  think  so.    Is  there  a  reason  for  supposing  that 
most  people  don't.    Nevertheless  sex  is  absolutely  neces- 
sary as  though  the  electron  diffraction  camera  plate  up 
on  the  top  surface  of 

9.    Fifteen  years  before  the  mast,  he  ever  had  eaten.  Try 

it  and  see,    I  believe  that  whatever  arises  a  fund  has 

been  accumulated  sufficiently  in  the  near  future  holds 

m«  ™™  *  *      ■  •        •  ■  ... 


many  surprises.    No  man  can  judge  his  actions  by  his  wife 
Susie . 

10.  I  forget  whether  he  went  on  and  on.    Finally  he  stipulated 
that  this  must  stop  immediately  after  this.    The  last  time 
I  saw  him  when  she  lived.    It  "happened  one  frosty  look  of 
trees  waving  gracefully  against  the  wall.    You  never  can 

11.  When  I  bought  my  wife  a  long  time  ago.    I  knew  that  it 
wasn't  faster  when  he  didn't  eat  or  drink  a  toast  to 
John  Doe,  otherwise  known  as  McMillan's  theorem. 
Whatever  the  nature  of  Christ's  teachings.    Go  far  into 

12.  McMillan's  Theorem 

McMillan's  theorem  states  that  whenever  electrons  diffuse 
in  vacua.    Conversely  impurities  of  a  cathode.    No  sub- 
stitution of  variables  in  the  equation  relating  these 
quantities.    Functions  relating  hypergeometric  series 
with  confluent  terms  converging  to  limits  uniformly 
expanding  rationally  to  represent  any  function. 

13 •  House  Cleaning 

First  empty  the  furniture  of  the  master  bedroom  and  bath. 
Toilets  are  to  be  washed  after  polishing  doorknobs  the 
rest  of  the  room.    Washing  windows  semi-annually  is  to  be 
taken  by  small  aids  such  as  husbands  are  prone  to  omit 
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14.  Epiminondas 

Epiminondas  was  one  who  was  powerful  especially  on  land 
and  sea.    He  was  the  leader  of  great  fleet  maneuvers  and 
open  sea  battles  against  Pelopidas  but  had  been  struck  on 
the  head  during  the  second  Punic  war  because  of  the  wreck 
of  an  armored  frigate. 

15.  Salaries 

Money  isn't  everything.    However,  we  need  considerably 
more  incentive  to  produce  efficiently.    On  the  other  hand 
too  little  and  too  late  to  suggest  a  raise  v/ithout  a  reason 
for  remuneration  obviously  less  than  they  need  although 
they  really  are  extremely  meager. 

16.  Murder  Story 

When  I  killed  her  I  stabbed  Claude  between  his  powerful 
jaws  clamped  cruelly  together.    Screaming  loudly  despite 
fatal  consequences  in  the  struggle  for  life  began  ebbing 
as  he  coughed  hallowly  spitting  blood  from  his  ears. 
Burial  seemed  unnecessary  since  further  division  was 
necessary. 


The  sources  are:     3,  from  "Ulysses"  by  James  Joyce, 
page  748;  7  and  14  are  the  conversation  and  writings  of  two 
schizophrenic  patients  (quoted  from  Bleuler,  "A  Textbook  of 
Psychiatry").    All  others  constructed  by  statistical  means. 
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5.  SIGNIFICANCE  AND  APPLICATION 


C.  E.  Shannon 
Bell  Telephone  Laboratories 
Murray  Hill,  N.  T. 


1.  Introduction. 

A  general  communication  system  is  shown  in  Figure  3.  An  information  source 
produces  a  message.  This  is  encoded  in  a  transmitter  to  produce  a  signal  suitable  for 
transmission  over  the  channel.  During  transmission  the  signal  may  be  perturbed  by 
noise.  The  perturbed  signal  is  decoded  or  demodulated  at  the  receiver  to  recover,  as 
well  as  possible,  the  original  message. 

The  situation  is  roughly  analogous  to  a  transportation  system  for  transporting  physical 
goods  from  one  point  to  another.  We  can  imagine,  for  example,  a  lumber  mill  producing 
lumber  at  an  average  rate  of  R  cubic  feet  per  second  and  a  conveyor  system  capable  of 
transporting  C  cubic  feet  per  second.  If  R  is  greater  than  C  the  full  output  of  the  mill 
cannot  possibly  be  carried  on  the  conveyor.  On  the  other  hand,  if  R  is  less  than  or  equal 
to  C  it  may  or  may  not  be  possible,  depending  on  whether  the  lumber  can  be  efficiently 
packed  in  the  available  space  of  the  conveyer.  However,  if  we  allow  ourselves  to  saw 
the  lumber  up  into  suitable  sizes  and  shapes  we  can  always  approach  100  per  cent  effi- 
ciency in  packing.  In  this  case  we  must,  of  course,  supply  a  carpenter  shop  at  the  other 
end  of  the  conveyor  to  reassemble  the  lumber  in  its  original  form  before  passing  it  on 


If  the  analogy  is  sound  we  might  hope  to  define  two  parameters  R  and  C  associated 
with  an  information  source  and  a  channel,  respectively.  R  should  measure,  in  some 
sense,  how  much  information  is  produced  per  second  by  the  source,  and  C  the  capacity 
of  the  channel  when  used  in  the  most  efficient  manner  for  transmitting  information.  We 
would  expect  then  that  if  R  ^  C  the  full  output  of  the  source  cannot  be  transmitted  satis- 
factorily. If  R  ^  C  it  should  be  possible  to  transmit  the  output  of  the  source  by  proper 
encoding  and  decoding  at  transmitter  and  receiver.  It  turns  out  that  it  is  possible  to 
define  quantities  R  and  C  which  measure  these  information  rates  and  capacities  and 
satisfy  the  desired  relationships.  We  will  attempt  to  show  how  this  can  be  done  without, 
however,  giving  mathematical  proofs  of  the  results.1 

2.  The  Information  Source. 

The  first  problem  is  that  of  clarifying  the  nature  of  "information"  and  finding  a 
measure  of  the  rate  of  production  for  an  information  source. 

Information  involves  basically  the  concept  of  "choice."  An  information  source 
chooses  one  particular  message  from  a  set  of  possible  messages.  If  there  were  only 


!For  mathematical  details,  see  Shannon,  C.E.,  "A  Mathematical  Theory  of  Commu- 
nication," Bell  System  Technical  Journal.  July  and  October,  1948.  See  also  Shannon, C .E . , 
"Communication  in  the  Presence  of  Noise,"  Proceedings  of  the  I.R.E.  (Forthcoming). 


to  the  consumer. 
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one  possible  message  there  would  be  no  communication  problem.  The  amount  of  informa- 
tion produced  by  a  source  must  evidently  be  related  to  the  range  of  choice  available. 

The  simplest  possible  choice  is  a  choice  from  two  equally  likely  possibilities,  say 
0  or  1.  We  shall  call  the  corresponding  unit  of  information  a  binary  digit  or  "bit."  A 
relay  or  flip-flop  circuit  has  two  possible  states  and  is  capable  of  storing  one  bit  of 
information. 

A  device  which  chooses  at  random  from  0  or  1  making  one  choice  each  second  is 
considered  to  be  producing  information  at  rate  R  of  one  bit  per  second.  Such  a  source 
produces  a  "message"  which  is  a  random  sequence  of  O's  and  l's. 

A  choice  from  say. 32  equally  likely  possibilities  can  be  considered  as  a  series  of  five 
choices,  each  from  two  equally  likely  possibilities,  and,  therefore,  should  correspond  to 
five  bits.  More  generally,  a  choice  from  n  equally  likely  possibilities  represent  logP 
n  bits.  £ 

Suppose  now  that  the  various  possible  choices  have  different  probabilities  of  occur- 
rence, say  pi,  p2,       pn.  How  much  information  is  produced  when  a  choice  is  made  under 
these  circumstances?  One  feels  intuitively  that  less  "choice"  is  involved  in  a  device 
which  chooses  between  0  and  1  with  probabilities  .01  and  .99  than  in  one  which  chooses 
with  equal  probabilities.  In  the  former  case  the  result  is  almost  sure  to  be  1. 

The  following  example  shows  that  by  proper  encoding  an  average  compression  can  be 
obtained  by  using  the  probabilities  pi,  P2,       pn.  Suppose  there  are  four  possible  choices 
A,  B,  C,  D  with  probabilities  pA  =  1/2,  pB  =  1/4,  pc  =  1/8,  pD  =  1/8.  If  we  use  a  simple 
direct  code  into  binary  digits: 

A  =  00       B  =  01       C  =  10       D  =  11, 

we  use  two  binary  digits  per  letter.  On  the  other  hand,  using  the  following  code  where 
more  probable  letters  are  given  short  codes  and  less  probable  letters  longer  codes,  we 
obtain  an  average  saving 

A=0       B  =  10       C  =  110       D  -  111. 

This  is  a  reversible  code;  the  original  text  can  be  recovered  from  the  encoded  sequences 
as  is  readily  verified.  With  this  code  we  need,  on  the  average,  only 

(1/2  x  1  +  1/4  x  2  +  1/8  x  3  +  1/8  x  3)  =  1  3/4 

binary  digits  per  letter.  We  may  say  then  that  a  choice  with  probabilities  1/2,  1/4,  1/8, 
1/8  corresponds  to  1  3/4  bits  of  information.  If  an  information  source  were  producing 
a  sequence  of  the  letters  A,  B,  C,  D  with  these  probabilities  we  could  encode  it  into  a 
sequence  of  binary  digits  in  which  1  3/4  binary  digits  are  used  on  the  average  for  e?.ch 
letter  of  message. 

A  general  analysis  of  the  situation  shows  that  if  the  letters  are  chosen  with  probabili- 
ties plf  p2,        pn  then  it  is  possible  to  encode  into  binary  digits  using 

H  =  -  2,  Pi  log2  Pi 

binary  digits  per  letter  of  message  on  the  average,  and  there  is  no  method  of  reversible 
encoding  using  less.  This  H  then  is  the  equivalent  number  of  bits  per  letter,  and,  if  the 
source  produces  n  letters  per  second,  R  =  nH  is  the  rate  of  production  in  bits  per  second. 
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In  the  case  of  English  text  the  statistical  structure  is  more  involved.  There  are  the 
mricms  letter  probabilities  Pi,  but,  also,  there  are  statistical  influences  between  nearby 
totters    For  example,  the  letter  T  is  more  often  followed  by  H  than  by  any  other  letter 
a  Qis  almost  invariably  followed  by  U,  etc.  In  such  cases  there  is  a  more  general  formula 
i  for  calculating  the  equivalent  number  of  bits  per  letter  of  message.  Let  pU,  3»  ■  s)oe 
i  Ibe  probability  in  the  language  of  the  sequence  of  letters  i,  j  s.  Then  we  define  G„ 


ft 


l: 


.V;!i. 

m 


p(i,  j,       s)  log2  p(i,  i,  ....  s) 


where  the  sum  is  over-all  sequences  of  letters  which  are  just  n  letters  long  J^h  which 

ouences  Gi.  Go  Gn>  ...  represents  a  series  of  approximations  to  the  desired  H  which 

takes  into  account  mofe  and  more  of  the  statistical  structure  as  we  proceed  along  the 
sequence.  The  information  per  letter  of  message  can  be  defined  by  the  limiting  value  of 
the  G's. 


H  =  Lim  G 


— »  oo 


n 


It  can  be  shown  that  H  has  the  desired  properties;  namely,  we  can  encode  the  messages 
from  the  source  into  binary  digits  using  H  binary  digits  per  letter  on  the  average,  and  no 
method  of  encoding  uses  less. 

For  the  English  language  H  has  been  estimated  at  roughly  2  bits  per  letter,  taking 
account  only  of  the  statistical  structure  out  to  about  6  or  8  letters. 

If  the  messages  produced  by  the  information  source  are  continuous  functions  of  time 
ta  in  speech  or  television  transmission,  the  situation  is  much  more  involved  and  we  will 
not  discuss  it  in  detail.  It  is  still  possible  to  assign  a  rate  of  production  of  information 
In  bits  per  second  to  such  a  source,  but  the  rate  now  depends  on  other  considerations. 
With  continuous  functions  as  messages,  exact  reproduction  is  not  generally  required  and 
the  rate  R  depends  on  the  amount  and  nature  of  the  discrepancy  which  can  be  tolerated 
between  the  original  and  recovered  messages.  The  tolerable  discrepancy  in  turn  is 
determined  by  the  final  destination  of  the  messages.  With  speech,  for  example,  the  toler- 
able errors  depend  on  the  structure  of  the  human  ear  and  brain. 

Although  the  mathematical  problems  involved  in  defining  the  rate  for  a  continuous 
source  have  been  completely  solved,  it  is  in  practical  cases  very  difficult  to  estimate  R. 
The  following  calculation  may  be  of  some  interest,  however.  Suppose  we  are  interested 
only  in  transmitting  English  speech  (no  music  or  other  sounds),  and  the  quality  require- 
ments on  reproduction  are  only  that  it  be  intelligible  as  to  meaning.  Personal  accents, 
Inflections,  etc.,  can  be  lost  in  the  process  of  transmission.  In  such  a  case  we  could  at 
least  in  principle,  transmit  by  the  following  scheme.  A  device  is  constructed  at  the  trans- 
mitter which  prints  the  English  text  corresponding  to  the  spoken  words    These  can  be  ^ 
translated  into  binary  digits  in  the  ratio  of  about  two  binary  digits  per  letter,  or  ^x4.D  -  v 
per  word.  Taking  100  words  per  minute  as  a  reasonable  talking  speed  we  obtain  900  bits 
per  minute  or  15  bits  per  second  as  an  estimate  of  the  rate  for  English  speech  when  in- 
telligibility is  the  only  fidelity  requirement. 

3.  The  Capacity  of  a  Channel. 

We  now  consider  the  problem  of  defining  the  capacity  C  of  a  channel  for  transmitting 
Information.  Since  we  have  measured  the  rate  of  production  for  an  information  source  in 
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mitted  over  a  given  channel? 

in  some  cases  the  answer  Is  simple.  With  a .  tele «»J%*£Z ^second, 

can  send  5n  bits  per  second. 

Suppose  now  that  the  channel  is  defined  £ fc^j. JJ- ^  Vyclef pTrse^nfwide . 
tions  of  time  f(t)  which  lie  within  a  cer^»^  a  series  of 

It  is  known  that  a  function  of  thi^type  can  be  J£j  say  that  such  a  function 

equally  spaced  sampling  points^  seconds  apart    Thus  we  may  say 
has  2W  degrees  of  freedom,  or  dimensions,  per  second. 

If  there  is  no  noise  whatever  » 

Even  when  there  is  noise,  if  we  place  no ^tjon s  ^JgPSSS!SSU 
capacity  will  be  infinite  for  we  m **£W2?£tof  e«    p  transmitter 
number  of  different  amplitude  levels  .^^nw^etevres  The  capacity  depends,  of 

limitation. 

The  shiest  type  o,  noise  is  white  V^tt'S^K''' 
distribution  of  ampUt^s  is  Ga**ta, and  to  a  eetrnmr s  ilat q      7  ^  tf 

into  a  unit  resistance. 

The  simplest  limitation  on  transmitter  power  is  ^^^S^£%M 
SLr«TL£T£K  SLrto/eTarametLs  W,  P,  and  N, 
the  capacity  C  can  be  calculated.  It  turns  out  to  be 

C  =  W  log2    E-^Ji  (bits  per  second). 


P  +  N 
N 


different  amplitudes  at  each  sample  point.  In  a  time  T  there  will  be  2TW  independent 
samples.  Thus,  there  are  approximately 

(  /  P  +  N)  2TW     (p  +  N)TW 
M  "  (V     N    )         =  (    N  ) 

different  signal  functions  of  duration  T  that  can  be  distinguished  from  one  another  in  spite 
of  the  noise.  This  corresponds  to 
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log2  M  =  TW  log2  P  ftN 

binary  digits  in  the  time  T  or 

C=W  log2  P^N 

binary  digits  per  second.  This  formula  has  a  much  deeper  and  more  precise  signifi- 
cance than  the  above  argument  would  indicate.  In  fact  it  can  be  shown  that  it  is  possible, 
by  properly  choosing  our  signal  functions,  to  transmit  W  log2     fo^  binary  digits  per 
second  with  as  small  a  frequency  of  errors  as  desired.  It  is  not  possible  to  transmit 
binary  digits  at  any  higher  rate  with  an  arbitrarily  small  frequency  of  errors.  This 
means  that  the  capacity  is  a  sharply  defined  quantity  in  spite  of  the  noise.  These  state- 
ments are  proved  by  two  different  methods. * 

The  formula  for  C  applies  for  all  values  of  P/N.  Even  when  P/N  is  very  small,  the 
average  noise  power  being  much  greater  than  the  average  transmitter  power,  it  is  pos- 
sible to  transmit  binary  digits  at  the  rate  W  log2P     N  with  as  small  a  frequency  of 
errors  as  desired.  In  this  case  log2  (1  +£)  is  approximated  by  -£log2  e  =  1.443  ^ 
and  we  have  approximately 

C  =  1.443 

It  should  be  emphasized  that  it  is  only  possible  to  transmit  at  a  rate  C  over  a  channel 
by  properly  encoding  the  information.  In  general,  the  rate  C  is  only  approached  as  a  limit 
by  using  more  and  more  complex  encoding  and  longer  and  longer  delays  at  both  trans- 
mitter and  receiver.  In  the  white  noise  case  the  best  encoding  is  such  that  the  transmitted 
signals  themselves  have  the  structure  of  a  white  noise  with  power  P.  The  difficulty  with 
the  approximate  argument  given  for  that  case,  and  the  reason  it  does  not  give  a  sharply 
defined  capacity,  is  that  the  selection  of  signals  is  not  optional.  The  distribution  of  ampli- 
tudes is  not  Gaussian  as  it  should  be. 

4.  Comparison  of  Ideal  and  Practical  Systems.  * 

In  Figure  4  the  curve  is  the  function 

%  =  log  (1  +f ) 

plotted  against  P/N  measured  in  db.  It  represents,  therefore,  the  channel  capacity  per 
unit  of  band  with  white  noise.  The  circle  and  points  correspond  to  PCM  and  PPM  systems 
used  to  send  a  sequence  of  binary  digits  and  adjusted  to  give  about  one  error  in  1CP  binary 
digits.  In  the  PCM  case  the  number  adjacent  to  a  point  represents  the  number  of  ampli- 
tude levels  -  3  for  example  is  a  ternary  PCM  system.  In  all  cases  positive  and  negative 
amplitudes  are  used.  The  PPM  systems  are  quantized  with  a  discrete  set  of  possible 
positions  for  the  pulse,  the  spacing  is  ^j,  and  the  number  adjacent  to  a  point  is  the  num- 
ber of  possible  positions  for  a  pulse. 

The  series  of  points  follows  a  curve  of  the  same  shape  as  the  ideal  but  displaced 
horizontally  about  8  db.  This  means  that  with  more  involved  encoding  or  modulation  sys- 
tems a  gain  of  8  db.  in  power  could  be  achieved  over  the  system  indicated. 


See  Shannon,  C.  E.,  "Mathematical  Theory  of  Communication"  and  "Communication 
in  the  Presence  of  Noise." 
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Of  course,  as  one  attempts  to  approach  the  ideal,  the  transmitter  and  receiver  re- 
quired become  more  complicated  and  the  delays  increase.  For  these  reasons  there  will 
be  some  point  where  an  economic  balance  is  established  between  the  various  factors 
It  may  well  be,  however,  that  even  at  the  present  time  more  complex  systems  would  be 
justified. 

A  curious  fact  illustrating  the  general  misanthropic  behaviour  of  Nature  is  that  at 
both  extremes  of  P/N  (when  we  are  well  outside  the  practic*  ^/^pcMlotaS 
in  Figure  4  approach  more  cjosely  the  ideal  curve.  At  very  large  P/N  *e,f  £M  pomts 
Approach  to  within  10  log10#  =  4.5  db.  of  the  ideal  while  with  very  small  P/N  the  PPM 
points  approach  to  within  3  db.  The  relation 

C  =  W  log  (1 

can  be  regarded  as  an  exchange  relation  between  the  parameters  W  and  P/N.  Keeping  the 
ch^el  cgacity  fixed  we  can'decrease  the  bandwidth  W  provided  we  ^ease  P/N  «£- 
ficiently.  Conversely,  an  increase  in  band  allows  a  lower  signal-to-noise  ratio  in  the 
channel    The  required  P/N  in  db.  is  shown  in  Figure  5  as  a  function  of  the  band  W.  It  is 
assumed  here  that  as  we  increase  W,  N  increases  proportionally: 

N  =  W  N0 

where  N0  is  the  noise  power  per  cycle  of  band.  It  will  be  noticed  that  if  P/N  is  large  a 
reduction  of  band  is  very  expensive  in  power.  Halving  the  band  roughly  doubles  the 
signal-to-noise  ratio  in  db.  that  is  required. 

The  channel  capacity  C  can  be  calculated  in  many  other  cases.  A  general  result  that 
applies  in  any  situation  where  the  average  transmitter  power  is  limited  to  P  is  that  the 
channel  capacity  is  bounded  by: 

WlogL^l^C  £W  log^ 

where  N,  is  a  parameter  called  the  "entropy  power"  of  the  noise.  It  is  defined  as  the 
power  ina  white  noise  having  the  same  entropy  as  the  actual  noise.  N  is,  as  before,  the 
average  noise  power. 
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Note  on  Certain  Transcendental  Numbers 
Claude  E.  Shannon 


This  note  calls  attention  to  a  certain  class  of 
numbers  that  are  easily  shown  to  be  transcendental  but  seem 
to  have  escaped  previous  notice.     A  typical  example  is  the 
number 

-2  * 

X  =  2  * 

or  more  precisely  X  =  ^Lim^Xn,  ^n+l  =  2      *  ^0  =  2*    ^  is  ^ 
easily  seen  that  X  exists  and  satisfies  the  equation  X  =  2"  . 
It  is  known  from  a  conjecture  of  Hilbert ,  proved  by  Gelfond 

and  by  Schneider,  that  ax  is  transcendental  if  a  /  0,  1  is 
algebraic  and  x  is  an  algebraic  irrational.    Nov;  X  is  clearly 
not  rational,  and  if  we  suppose  it  an  algebraic  irrational, 
it  must  then  be  transcendental,  a  contradiction.    Hence  it  is 
transcendental. 

More  generally  let  f  be  a  function  such  that  if 
x  is  algebraic  and  does  not  belong  to  a  set  S,  then  f(x)  is 
transcendental.    Let  g1  and  g2  be  algebraic  functions  and 

such  that  x  f  g1fg2x,  xeS.     Then  the  solutions  of 

are  transcendental  by  a  similar  argument ,  using  the  fact  that 
g£  is  algebraic.  If  the  sequence  Xn  =  (g1fg2)1X0  approaches 
a  limit  X  it  must  be  transcendental.  Some  functions  known  to 
have  the  property  required  for  f  are  sin  x,  ex  and  JQ(x) ,  the 
exceptional  set  S  consisting  of  the  number  0. 


C.  E .  SHANNON 
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\    '.  A  CASE  OF  EFTIC1EHT  CGDI83  FOl  A  BOIST  CHAH38L 


Consider  a  di  aerate  channel  with  two  poeeiMe  symbols 
0  and  1*    Hoise  it  aeeuaec  to  affect  successive  cyrbolB  inde- 
pendently **nd  in  such  6  wty  that  t  o  probability  of  a  syjabol 
bainf,  inter,  reted  correctly  at  the  receiver  ie  j>  »  *  g  1  wnlealg 


the  probability  of  incorrect  interpretation  io  q  - 

^  2 

ca^city  of  such  &  channel  is 


-  e2 

Ve  e©»us»  e  very  soall  and  epproximte  log  (1  ♦  c)  by  z 

2 

*  e2  (natural  units) 
In  bits  .or  ayebel,  the  capacity  1st 

C  -      log*,  a 

A  vary  eiaple  coda  can  be  oonetruct<*J  for  this  eyatea 
to  aond  a  Doquence  of  random  binary  dibits  at  nearly  the  rata  C 
with  a  quite  snail  frequency  of  errors |  In  other  wards  a  code 
Wuich  la  not  far  fron  the  ideal*  The  code  is  merely  to  repeat 
each  binary  digit  in  the  oeeeage  a  large  number  n  of  tiasee.  At 
the  roceiver,  a  group  of  n  is  received,  end  the  rajority  report 
la  taken  aa  the  original  nessags  eynbol. 


If  the  m&mrp  eynhol  is  0  then  a  0fs  are  trans-itted. 
At  tilt  receiver  the  n  received  eynbols  will  be  a  -istur©  of 
0*8  und  l»a  the  number  of  0*s  present  will  be  distributed  ac- 
cording to  a  binonial  distribution  with  p  •  I  *,  *  and  q  ■ 


For  large  n  the  binonial  distribution  is  approximately  nornal 
(and  this  approximation  is  especially  ^ood  when  p  5  s  close  to 

i).  The  exacted  nc->*r  of  O'c  is  p  n,  and  the  standard  devia- 
tion is; 


An  error  occu*e  when  the  number  of  rocoivod  O'o  ie  lose  than 
l.e*  when  the  actual  number  of  cores  is  p  n  -  §  av*iy  froo 
t;ie  ejected  nunber.    In  terras  €>f  r  this  iat 


*■       -    ^  — ^  standard  deviations. 


Hence  the  frequency  of  errors  is  given  by  the  area  of  a  noma! 
curve  with  otandard  deviation  equal  to  unity  fron  a  out  to  m. 

To  obtain  a  frequency  of  errors  10*3,  say,  we  mist 
have  a  ■  1*5 


n 

t 


and  the  rate  is  -JL.  as  coopered  with  the  rate  1«.&5  the 

2.3 

ideal  (with  essentially  zero  froquency  of  errors). 


Hovenber  IS, 


c.  s.  svjjman 


December  6,  1943 


Note  on  Reversing  A  Discrete  Markhoff  Process 

In  "A  Mathematical  Theory  of  Communication"  a 
language  was  represented  by  a  discrete  Markhoff  process  with 
a  finite  number  of  possible  states.    Such  a  stochastic  process 
can  be  represented  schematically  by  means  of  an  oriented  linear 
graph  as  in  Fig.  1 

Consider  the  question  of  generating  the  same  language 
in  reverse;  for  example,  English  but  read  backwards.    Can  we 
always  invert  a  finite  state  Markhoff  process  and  obtain  a 
finite  state  Markhoff  process?    The  answer  is  "yes"  and  further- 
more the  corresponding  linear  graph  has  the  same  topology,  but 
with  reversed  kwwl  orientation  on  all  branches.    If  the 
original  process  has,!  probabilities /(probability  when  in  state 
i  of  going  to  state  j),  then  the  reverse  process  has  the  same 
state  probabilities  and  the  transition  probabilities  given  by: 

<yU)  -  g  Hii) 

t 

This  is  true  since  this  qj(i)  is  merely  the  a  posteriori  probability 
for  the  original  process  that  when  in  state  j  the  preceding  state 
was  state  i.    The  inverse  of  Fig.  1  is  shown  in  Fig.  2. 

It  is  interesting  to  show  directly  that  the  entropy 
H£  of  the  reverse  process  is  equal  to  the  entrop4jHp  of  the 
forward  process.    Of  course,  this  must  be  true  a  posteriori  from 
the  general  properties  of  entropy.    V/e  have 

Pjfi'jU)  -  PifKj) 

9  ? 


-  2  - 


Hence  t 

ZP^U)  log  Pjqj(i)  -  ZPifi(j)  log  Pl^i(j) 


or 


2Pjqj(i)  log  qj(r)  ♦  2Pjqj(i)  log  ?± 

-  ZtjfiU)  log  ♦  ZPij^itj)  log  Pi 

Iff 

Hence: 

-HR  +  ZPj  log  Pj  —Hp  ♦  ZPi  log  Pi 


C.  E«  SHANNON 
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Outline  of  Talk 
American  Statistical  Society,  December  28,  1949 

INFORMATION  THEORY 
by 

C.  S.  Shannon 

Bell  Telephone  Laboratories,  Inc.,  Murray  Hill,  R.  J. 

1,  Information  Produced  by  a  Stochastic  Process 

In  communication  engineering ,  we  are  interested  in 
transmitting  messages  from  one  point  to  another.    The  messages 
generally  consist  of  a  sequence  of  individual  symbols,  such  as 
the  letters  of  printed  English,  which  are  governed  by  proba- 
bilities.   Thus,  in  English,  there  are  the  various  letter  fre- 
quencies, digram  frequencies,  etc.    The  "meaning*  of  the 
message  (if  any)  is  irrelevant  to  the  engineering  problem. 
Abstractly,  then,  we  may  consider  a  message  to  be  a  sequence  of 
meaningless  symbols  produced  by  a  suitable  Stochastic  process. 
Communication  systems  must  be  designed  to  handle  the  ensemble 
of  possible  messages;  the  particular  one  which  will  actually 
occur  is  not  known  when  the  system  is  constructed.    The  source 
producing  messages  is  assumed  to  have  only  a  finite  number  of 
possible  internal  states. 

2.  Entropy  as  a  Measure  of  -Information 

A  suitable  measure  of  the  amount  of  Information  pro- 
duced  by  a  discrete  Stochastic  process  is  given  by  the  entropy 
H,  where 

Ha-   Um  hi  p^,  lo*2  **xl»  ••"» 

■  ™e>  ^S»  sw 


-  2  - 

in  which  x^,  •       Xjj  is  &  sequence  of  N  symbols  produced  by 

the  process,  p(x^f  •*#,  x^)  is  the  probability  of  this  ssquence, 

and  the  sum  is  over  all  sequences  of  this  length. 

The  significance  of  the  quantity  H  is  that  it  is  pos- 
sible to  translate  messages  from  a  source  with  entropy  H  into  a 
sequence  of  binary  digits  (0  or  1)  using,  on  the  average,  H  +  c 
binary  digits  per  letter  of  the  original  message  with  any 
positive  c.    It  is  not  possible  to  translate  so  that  fewer  are 
used*    Thus.  B  measures,  in  a  sense,  the  equivalent  number  of 
binary  digits  per  letter  of  message.    It  can  be  shown  that  H 
also  determines  the  amount  ef  channel  capacity  required  for 
transmission  of  the  original  messages. 

entropy,  Hx(y) ,  of  one  source  relative  to  another.  This 
measures  in  a  sense  the  uncertainty  per  letter  of  the  y  sequence 
when  the  x  sequence  is  known,  or  ths  amount  of  additional  infor- 
mation in  the  y  sequence  over  that  available  in  the  x  sequence. 
Hx(y)  can  be  defined  as  follows: 

Hjty)  «  H(x,  y)  -  H(x) 

where  H(x,  y)  is  the  entropy  of  the  sequence  whose  elements  are 

ths  ordered  pairs  (x,  y) • 

3.    The  Nature  of  Information 

While  the  entropy  H  measures  the  amount  of  information 
produced  by  a  Stochastic  process,  it  does  not  define  the  infor- 
mation itself.    Thus  two  entirely  difference  sources  might 


produce  information  at  the  same  rata  (same  H)  but  certainly  they 
are  not  producing  the  same  information.    If  we  translate  the 
output  of  a  particular  source  into  a  different  "language"  by  a 
reversible  operation,  the  translation  may  be  said  to  have  the 
same  information  as  the  original.    Thus  we  are  led  to  consider 
the  information  of  a  Stochastic  process  as  that  which  is  common 
to  all  translations  obtained  from  the  given  process  by  members 
of  the  group  0  of  reversible  translations,  or,  alternatively,  as 
the  equivalence  class  of  all  processes  obtains*  from  the  given 
one  by  such  translations.    To  avoid  certain  paradoxical  situa- 
tions, involving  infinite  internal  storage  in  the  transducer 
doing  the  translating,  it  is  desirable  to  first  limit  the  group 
Q  to  translations  possible  in  transducers  having  a  finite 
number  of  possible  internal  states.    The  information  associated 
with  a  process  may  bs  denoted  by  a  single  letter,  say  X.  Thus 
X  =  T  means  that  T  can  be  obtained  by  a  translation  of  I,  and 
conversely.    It  is  possible  to  set  up  a  metric  satisfying  the 
usual  postulates  as  follows: 

*  2H(x,  y)  -  *(x)  -  H(y)  . 

Vith  this  metric  It  Is  possible  to  define  limiting  sequences  of 
elements,  each  of  which  is  an  information.   Thus  s  Cauchy 
sequence,  XjL>  Xj,  i«  defined  by  requiring  that 

Lim   ptX,,  In)  «  0  . 


The  Introduction  of  these  sequences  as  new  elements  (analogous 
to  irrational  numb ere)  completes  the  space  in  a  satisfactory 
way  and  enables  one  to  simplify  the  statement  of  various  results. 
k.    The  Information  Lattice 

A  relation  of  inclusion,  x  >  y,  between  two  infor- 
mation elements  x  and  y  can  be  defined  by 

x  >  7  *  Hx(y)  ■  0  . 

This  essentially  requires  that  y  can  be  obtained  by  a  suitable 
finite  state  operation  (or  limit  of  such  operations)  on  x.  If 
x  >  y  we  call  y  an  abstraction  of  x.    If  x  >  y,  y  >  s,  then 
x  >  s.    If  x  >  y,  then  H(x)  >  H(y).    Also  x  >  y  means  x  >  y, 
x  f  y.    The  information  element,  one  of  whose  translations  is 
the  process  which  always  produces  the  same  symbol,  is  the  0 
element,  and  x  >  0  for  any  x. 

The  sum  of  two  Information  elements,  s  m  x  +  y,  is  the 
process  which  produces  the  ordered  pairs  (x^,  yn).    We  have 

and  there  is  no  u  <  s  with  the  properties;  a  is  the  least  upper 
bound  of  x  and  y. 

The  product  s  »  xy  is  defined  as  the  largest  t  such 
that  •  >  x,  s  >  yj  that  is,  there  is  no  u  >  s  haying  both  x 
and  y  as  abstractions.    The  product  is  unique. 


With  these  definition*  information  element e  fona  a 
metric  lattice.    The  lattice  it  not  distributive,  nor  even 
modular.    A  non-distributive  example  1b  x,  y  independent 
sequences  of  binary  digits,  with  z  the  sequence  obtained  by- 
mod  2  addition  of  corresponding  symbols  in  x  and  y.  Then 

sy  +  2x  =  0  +  0  =  0 
i(x  +  y)  ■  i  /  0  . 

The  lattices  are  relatively  complimented.  There 
exists  for  x  <  y  a  ■  with 

s  +  x  =  y 

sx  =*  0  . 

The  element  s  is  not,  in  general,  unique. 
5.    The  Delay  Free  Group  0^ 

The  definition  of  equality  for  information  based  on 
the  group  0  allows  x  =  y  when  y  is,  for  example,  s  delayed 
version  of  x$  yB  ■  x^.    In  some  situations,  when  one  must 
act  on  information  at  a  certain  time,  a  delay  is  not  permis- 
sible.   In  such  a  case  we  may  consider  the  more  restricted 
group      of  instantaneously  reversible  translations.    One  may 
define  inclusion,  sum,  product,  etc.,  in  an  analogous  way,  and 
this  also  leads  to  a  lattice  but  of  mush  greater  complexity 
and  with  many  different  Invariants. 


Proof  of  an  Integration  Formula 

C. E.  Shannon 

The  integral 

0     sin2  x  2  sin^  or 

has  arisen  in  an  acoustical  problem.  It  has  been  evaluated  for  N  =  1,  2,  3,  4  as 
equal  to 

gN  (a)  =  a  N  +  2  i— r-1  sin  2  i  a  (2) 
(-1  ' 

by  R.  C.  Jones,  and  he  has  conjectured  that  fN  =  gN  for  all  a,  Af.   A  general 
proof  follows. 

From  (1)  we  have 

.  ,  .  ,  „,  .  1  f  °  cos  lNx-2  cos  2(W  -  1)*  +  cos  2W  -  2)  x  . 
A2*,  -h  ~  Tfn-1  +  In -2  =  ~  y  J0   L^T^   ^ 

and 

d    a2     ,  ,  ,         cos  2Ate  -  2  cos  2flV  -  l)a  +  cos2(A^  -  2)a 

—  AW»(«)  y^   (3) 


Also  from  (2) 


Aiv  =  a  +  2 


(-1  ' 


2  _  sin  2(AT  -  1)  a 

AN.  AT  ftV(a)  N~^\ 


tit.N  gsw  =  2  cos  2(N  -  1)  a  (4) 
The  equality  of  (3)  and  (4)  can  be  established  by  noting  that  the  numerator  of  (3), 
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Hence 


cos  2  N  a  -  2  cos  2(N  -  l)a  +  cos  2(N  -  2)a 
Re  [eJV,a  -  2eJ2{N~l)a  +  e/W-2)aj 


Re 


^-i)a[c,2a_2  +  c-,2a]J 
=  Re  |«W-D«  (2;-)2 


2j 


-  -  Re  |4  sin2  a  ^W-1)*)  =  -  4  sin2  a  cos  2(N  -  l)a 


but    A2      (0)  =  A2  fN  (0)  =  0,  so  that 
^2n,n8nM  =  Ai^/jvCot) 
also  it  has  been  verified  that 

Si  (°0  =  /i(a) 
£2  (°0  =  /2(a) 

Hence  it  follows  in  general  that 


A  &leit*l  ******  »t  fr^Mlttltac  lafonttttoa 


2t  Is  p*«*lM*  fey  ¥fe*l*u#  of  eodulaUoe  to  Xmr 

pjroto  oao  tutpmt  of  e  oystos  for  *jr&»o*iUia£  Iafor»*Uoa  at  too 
OXpoooo  Of  otters.    Mi«  T*risro«  car.atmeo  *tic*  mj  se  exoasuigfg 

i,     uaitty  of  rocoivo*  oigoel,  ftiiica  ess  bo  rou^iJ/ 

SMMMHtrwS  la  *««HM»  t>/  S&0  tO  £13 1 00 

- 

ratio* 

£•     TtttiiBZi  2 1%9?  yc**r»p. 

S.    tlm  of  troossUooi£A» 

ft.     BoiOO  4*4  t&O  OJKfeOtt* 

aoooroX  tteojr*  of  bow  tfeooo  voriofcioo  oro  roiotoO  «*4  tSm 

liivwi»«d  oafi  will  oe  &«volopo4  la  a  forthoofclas  soaorwifim. 
Bo»oo«r  «poofcitt&  x-.Ht*M/  *&4  oa&or  «  sus&ber  of  oojJUioay  0001*09-  - 

f ol2ooXm«  e^ufitioos 

a  ■  f  if  y  10  {*) 

3  *  «  aooouro  Of  4ii*t0rtiGji  at  tftt  **««tv*r 

t  *         *f  trooonlooiaa 

*  •  bsaa  iriiia  ©f  tro-ts&ittor 

ST  *  aciso  j-«w«T  £*30|t?fl   ti:«t  1»  t&O  O&iOO  ?OW*r 
p#r  *Ait  tw?.i4  oil  Hi,  *>*«&r*e»  tolas 

alalia  *s  flfci  is  toe  rofii«»  u^At-?  *fi>.:mlaar*tioa 


yjUUi  ftmi  tautt  koojMtag  rooolToft  <|ooli*jr  istojr&ottt 
oo  aor  0100010  t,      F  «M  £  1a  r*rio*»o         o*>  loo*  ft*  oo 
kooo  tl*o  gpam  ©f  t&«  foooHoo* 


r  1 21 


«fcoro  £«*  an£  %  or«  too  WUl  triuioatttor  tatar  ao4  acl«o 
QJQjSgf,  **ria«  too  traaftftlsalast  tiao.    ^»  fcr  •sa«pl«  t/jr  to- 
oroosiog  btutf  wUto  oo  ooo  eoorofioo  tra&o&ittor  -  tU« 

m&a&m&t  10  la  «a«  ooaoo  vor*  foooroolo  »iae*  It  lit  «  log- 
aritt.ai«  *moj  o**lag  aulto  or  boaA  oJUitfc  AlvMoo  t&o  o*or«r 
»jf  a  ft*  tor. 

»ro  two  »*tbfld«  of  fetter Sag  o1&ao1  *»  aaloo  rotlo  «t  too  ox»«ooo 
of  boo*  «i*to.    BoltOor  of  titooo  Jkwovo*  Is  by  oor  msw*  eftUud 
l&  too  ozobooso.   Sfco  $roooal  aoKomotoa  toooriooo  o  sow  ootfaoo 
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not  «oo£  toot  «t«  ftfotoa  of  troaoaiooieo  lo  •  tooorotioaHf 
Uool  ono  for  tkoro  oro  oororol  otHor  aooo*  of  iss$*miM*  ro- 
ooivoi  qooJLU*  fcooola*  f .  *.  ?  *o&  *  flxoi  -  «**t  tfclo  oro  too 
to  to  yWlt  m  ooarlr  tAool  oireonago  roto  ootooo*  too 


anlM  1m  Oaa^L  fift  Um  of  OOOlloo  fcfa*  YOl&OC 

of  too  lopot  ytoolotlag  fomoUoa  (too  o$oooa  faootloo  la  tolo- 
saoao  oaa  roftle)  ot  o  00300000  of  rofolorXr  ooboo*  oooyllat 


Thus     t«8  +  4~£**l  , 
Oi     *5  --«  4-4-2  +  1 

A  tnaaltttr  for  this  ay* taa  oould      built  1m  the 
following  way.    A  oondenaar  ia  okarged  as  usual  to  tha  eamplad 

roltage.    fill  roltaga  la  read  on  a  comparator  teiaaed  up  to 

■ 

half  the  *w<""t    If  the  comparator  glrea  a  poaitlra  Indlcatioa 
am  electronic  switch  la  oloaad  feeding  a  aegatire  pulaa  of  2* 
uuita  oT  charga  late  tha  condenser;  If  not  a  poaitlra  pulaa  of 
2m  unita  is  fad  in.    Tha  oomparator  is  now  switched  to  control 

'  - 

at  now  pulaa  source  whieh  preduaas  pulaaa  of  2n**1  units  and  tha 
prooaaa  is  repeated.    Thus  tha  circuit  f aods  in  positire  or 
nogatlTO  pulaaa  of  decreasing  magnituda  "hunting*  for  a  balance. 
At  oaoh  stags  a  rooordar  remembers  whathor  a  poaitlra  or  negatire 
pulaa  was  used.    Thass  positire  ant  nagatira  recordings  actually 
arc  tha  Binary  roprasantation  of  tha  original  roltaga,  as  ona 
can  soo  »y  roading  tha  shore  table  with  1»  roplaaod  by  0.  Baneo 
tha  raoolror  of  Jig,  4  can  ho  used  without  alteration  in  this 
system* 
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Creative  Thinking 

f 

Up  to  100%  of  the  amount  of  ideas  produced,  useful  good 
ideas  produced  by  these  signals,  these  are  supposed  to  be  arranged 
in  order  of  increasing  ability.    At  producing  ideas,  we  find  a 
curve  something  like  this.    Consider  the  number  of  curves  produced 
here  -  going  up  to  enormous  height  here, 

A  very  small  percentage  of  the  population  produces  the 
greatest  proportion  of  the  important  ideas.    This  is  akin  to  an 
idea  presented  by  an  English  mathematician,  Turig,  that  the  human 
brain  is  something  like  a  piece  of  uranium.    The  human  brain,  if 
it  is  below  the  critical  lap  and  you  shoot  one  neutron  into  it, 
additional  more  would  be  produced  by  impact.    It  leads  to  an  ex- 
tremely explosive    •  of  the  issue,  increase  the  size  of 
the  uranium.    Turig  says  this  is  something  like  ideas  in  the  human 
brain.    There  are  some  people  if  you  shoot  one  idea  into  the  brain, 
*    you  will  get  a  half  an  idea  out.    There  are  other  people  who  are 
beyond  this  point  at  which  they  produce  two  ideas  for  each  idea 
sent  in.    Those  are  the  people  beyond  the  knee  of  the  curve.  I 
don't  want  to  sound  egotistical  here,  I  don't  think  that  I  am 
beyond  the  knee  of  this  curve  and  I  don't  know  anyone  who  is.  I 
do  know  some  peopie  that  were.    I  think,  for  example,  that  anyone 
will  agree  that  Isaac  Newton  would  be  well  on  the  top  of  this 
curve.    When  you  think  that  at  the  age  of  25  he  had  produced  enough 

■ 

science,  physics  and  mathematics  to  make  10  or  20  men  famous  -  he 
produced  binomial  theorem,  differential  and  integral  calculus,  laws 
of  gravitation,  laws  of  motion,  decomposition  of  white  light,  and 
so  on.      Now  what  is  it  that  shoots  one  up  to  this 
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part  of  the  curve?    What  are  the  basic  requirements?    I  think  we 
could  set  down  three  things  that  are  fairly  necessary  for  scien- 
tific research  or  for  any  sort  of  inventing  or  mathematics  or 
physics  or  anything  along  that  line.    I  don't  think  a  person  can 
get  along  without  any  one  of  these  three. 

The  first  one  is  obvious  -  training  and  experience, 
lou  don't  expect  a  lawyer,  however  bright  he  may  be,  to  give  you 
a  new  theory  of  physics  these  days  or  mathematics  or  engineering. 

The  second  thing  is  a  certain  amount  of  intelligence  or 
you  have 

talent.    In  other  words, /to  have  an  IQ  that  is  fairly  high  to  do 
good  research  work.  I  don't  think  that  there  is  any  good  engineer 
or  scientist  that  can  get  along  on  an  IQ  of  100,  which  is  the 
average  for  human  beings.    In  other  words,  he  has  to  have  an  IQ 
higher  than  that.    Everyone  in  this  room  is  considerably  above 
that.    This,  we  might  say,  is  a  matter  of  environment;  intelligence 
ie  a  matter  of  heredity. 

Those  two  I  don't  think  are  sufficient.    I  think  there  is 
a  third  constituent  here,  a  third  component  which  is  the  one  that 
makes  an  Einstein  or  an  Isaac  Newton.    For  want  of  a  better  word, 
we  will  call  it  motivation.    In  other  words,  you  have  to  have  some 
kind  of  a  drive,  some  kind  of  a  desire  to  find  out  the  answer,  a 
desire  to  find  out  what  makes  things  tick.    If  you  don't  have  that, 
you  may  have  all  the  training  and  intelligence  in  the  world,  you 
don't  have  questions  and  you  won't  just  find  answers.    This  is  a 
hard  thing  to  put  your  finger  on.    It  is  a  matter  of  temperament 
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probably;  that  is,  a  matter  of  probably  early  training,  early  child- 
hood experiences,  whether  you  will  motivate  in  the  direction  of  scien- 
tific research.    I  think  that  at  a  superficial  level,  it  is  blended 
use  of  several  things.    This  is  not  any  attempt  at  a  deep  analysis  at 
all,  but  my  feeling  is  that  a  good  scientist  has  a  great  deal  of  what 
we  can  call  curiosity.    I  won't  go  any  deeper  into  it  than  that.  He 

wants  to  know  the  answers.    He's  just  curious  how  things  tick  and  he 

he 

wants  to  know  the  answers  to  questions;  and  if/sees  things,  he  wants 
to  raise  questions  and  he  wants  to  know  the  answers  to  those 0 

Then  there's  the  idea  of  dissatisfaction.    By  this  I  don't 
mean  a  pessimistic  dissatisfaction  of  the  world  -  we  don't  like  the 
way  things  are  -  I  mean  a  constructive  dissatisfaction.    The  idea 
could  be  expressed  in  the  words,  "This  is  OK,  but  I  think  things  could 
be  done  better.    I  think  there  is  a  neater  way  to  do  this.    I  think 
things  could  be  improved  a  little. w    In  other  words,  there  is  con- 
tinually a  slight  irritation  when  things  don't  look  quite  right}  and 
I  think  that  dissatisfaction  in  present  days  is  a  key  driving  force 
in  good  scientists. 

And  another  thing  I'd  put  down  here  is  the  pleasure  in  see- 
ing net  results  or  methods  of  arriving  at  results  needed,  designs  of 
engineers,  equipment,  and  so  on.    I  get  a  big  bang  myself  out  of  proving 
a  theorem.    If  I've  been  trying  to  prove  a  mathematical  theorem  for 
a  week  or  so  and  I  finally  find  the  solution,  I  get  a  big  bang  out  of 
it.    And  I  get  a  big  kick  out  of  seeing  a  clever  way  of  doing  some 


engineering  problem,  a  clever  design  for  a  circuit  which  uses  a  very 
small  amount  of  equipment  and  gets  apparently  a  great  deal  of  result 
out  of  it.    I  think  so  far  as  motivation  is  concerned,  it  is  maybe  a 

little  like  Fats  Waller  said  about  swing  music  -  either  you  got  it  or 

ii 

you  ain't.    If  you  ain't  got  it,  you  probably  shouldn't  be  doing  re- 
search work  if  you  don't  want  to  know  that  kind  of  answer.  Although 
people  without  this  kind  of  motivation  might  be  very  successful  in 
other  fields,  the  research  man  should  probably  have  an  extremely 
strong  drive  to  want  to  find  out  the  answers,  so  strong  a  drive  that 
he  doesn't  care  whether  it  is  5  o'clock  -  he  is  willing  to  work  all 
night  to  find  out  the  answers  and  all  weekend  if  necessary.  Well 
now,  this  is  all  well  and  good,  but  supposing  a  person  has  these 
three  properties  to  a  sufficient  extent  to  be  useful,  are  there  any 
tricks,  any  gimmicks  that  he  can  apply  to  thinking  that  will  actually 
aid  in  creative  work,  in  getting  the  answers  in  research  work,  in  gen- 
eral, in  finding  answers  to  problems?    I  think  there  are,  and  I  think 
they  can  be  catalogued  to  a  certain  extent.    You  can  make  quite  a  list 
of  them  and  I  think  they  would  be  very  useful  if  one  did  that,  so  I 
am  going  to  give  a  few  of  them  which  I  have  thought  up  or  which  peo- 
ple have  suggested  to  me.    And  I  think  if  one  consciously  applied 
these  to  various  problems  you  had  to  solve,  in  many  cases  you'd  find 
solutions  quicker  than  you  would  normally  or  in  cases  where  you  might 
not  find  it  at  all.    I  think  that  good  research  workers  apply  these 
things  unconsciously;  that  is,  they  do  these  things  automatically 
and  if  they  were  brought  forth  into  the  conscious  thinking  that  here's 


a  situation  where  I  would  try  this  method  of  approach  that  would 
probably  get  there  faster,  although  I  can't  document  this  state- 
ment. 

The  first  one  that  I  might  speak  of  is  the  idea  of  sim- 
plification.   Suppose  that  you  are  given  a  problem  to  solve,  I  don't 
care  what  kind  of  a  problem  -  a  machine  to  design,  or  a  physical 
theory  to  develop,  or  a  mathematical  theorem  to  prove,  or  some- 
thing of  that  kind  -  probably  a  very  powerful  approach  to  this 
is  to  attempt  to  eliminate  everything  from  the  problem  except  the 
essentials;  that  is,  cut  it  down  to  size.    Almost  every  problem 
that  you  come  across  is  befuddled  with  all  kinds  of  extraneous 
data  of  one  sort  or  another;  and  if  you  can  bring  this  problem 
down  into  the  main  issues,  you  can  see  more  clearly  what  you're 
trying  to  do  and  perhaps  find  a  solution.    Now,  in  so  doing,  you 
may  have  stripped  away  the  problem  that  you're  after.    You  may  have 
simplified  it  to  a  point  that  it  doesn't  even  resemble  the  problem 
that  you  started  with;  but  very  often  if  you  can  solve  this  simple 
problem,  you  can  add  refinements  to  the  solution  of  this  until  you 
get  back  to  the  solution  of  the  one  you  started  with. 

A  very  similar  device  is  seeking  similar  known  problems, 

I  think  I  could  illustrate  this  schematically  in  this  way.  Tou 

T  s 
have  a  problem  here  and  there  is  a  solution  which  you  do  not  know 

yet  perhaps  over  here.    If  you  have  experience  in  the  field  repre- 
sented, that  you  are  working  in,  you  may  perhaps  know  of  a  somewhat 
similar  problem,  call  it  P' ,  which  has  already  been  solved  and 


which  has  a  solution,  S'.    All  you  need  to  do  -  all  you  may  have 
to  do  is  to  find  the  analogy  from  P'  here  to  P  and  the  same  analogy 
from  S'  to  S  in  order  to  get  back  to  the  solution  of  the  given  prob- 
lem.   This  is  the  reason  why  experience  in  a  field  is  so  important 
that  if  you  are  experienced  in  a  field,  you  will  know  thousands  of 
problems  that  have  been  solved.    Tour  mental  matrix  will  be  filled 
with  P's  and  S's  unconnected  here  and  you  can  find  one  which  is 
tolerably  close  to  the  P  that  you  are  trying  to  solve  and  go  over 
to  the  corresponding  S'  in  order  to  go  back  to  the  S  you're  after. 
It  seems  to  be  much  easier  to  make  two  small  jumps  than  the  one  big 
jump  in  any  kind  of  mental  thinking. 

Another  approach  for  a  given  problem  is  to  try  to  restate 
it  in  just  as  many  different  forms  as  you  can.    Change  the  words. 
Change  the  viewpoint.    Look  at  it  from  every  possible  angle.  After 
you've  done  that,  you  can  try  to  look  at  it  from  several  angles  at 
the  same  time  and  perhaps  you  can  get  an  insight  into  the  real  basic 
issues  of  the  problem,  so  that  you  can  correlate  the  important  fac- 
tors and  come  out  with  the  solution.    It's  difficult  really  to  do 
this,  but  it  is  important  that  you  do.    If  you  don't,  it  is  very 
easy  to  get  into  ruts  of  mental  thinking.    Tou  start  with  a  problem 
here  and  you  go  around  a  circle  here  and  if  you  could  only  get  over 
to  this  point,  perhaps  you  would  see  your  way  clear;  but  you  can't 
break  loose  from  certain  mental  blocks  which  are  holding  you  in 
certain  ways  of  looking  at  a  problem.    That  is  the  reason  why  very 
frequently  someone  who  is  quite  green  to  a  problem  will  sometimes 


come  in  and  look  at  it  and  find  the  solution  like  that,  while  you 
have  been  laboring  for  months  over  it.    You've  got  set  into  some 
ruts  here  of  mental  thinking  and  someone  else  comes  in  and  sees  it 
from  a  fresh  viewpoint. 

Another  mental  gimmick  for  aid  in  research  work,  I  think, 
is  the  idea  of  generalization.    This  is  very  powerful  in  mathemati- 
cal research.    The  typical  mathematical  theory  developed  in  the  fol- 
lowing way  to  prove  a  very  isolated,  special  result,  particular  theo- 
rem -  someone  always  will  come  along  and  start  generalizing  it.  He 
will  leave  it  where  it  was  in  two  dimensions  before  he  will  do  it  in 
N  dimensions!  or  if  it  was  in  some  kind  of  algebra,  he  will  work  in 
a  general  algebraic  field;  if  it  was  in  the  field  of  real  numbers,  he 
will  change  it  to  a  general  algebraic  field  or  something  of  that  sort. 
This  is  actually  quite  easy  to  do  if  you  only  remember  to  do  it.  If 
the  minute  you've  found  an  answer  to  something,  the  next  thing  to  do 
is  to  ask  yourself  if  you  can  generalize  this  any  more  -  can  I  make 
the  same,  make  a  broader  statement  which  includes  more  -    there,  I 
think,  in  terms  of  engineering,  the  same  thing  should  be  kept  in  mind. 
As  you  see,  if  somebody  comes  along  with  a  clever  way  of  doing  some- 
thing, one  should  ask  oneself  "Can  I  apply  the  same  principle  in 
more  general  ways?    Can  I  use  this  same  clever  idea  represented  here 
to  solve  a  larger  class  of  problems?    Is  there  any  place  else  that 
I  can  use  this  particular  thing?" 

Next  one  I  might  mention  is  the  idea  of  structural  analysis 
of  a  problem.    Supposing  you  have  your  problem  here  and  a  solution 
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here.    You  may  have  too  big  a  jump  to  take.    What  you  can  try  to 
do  is  to  break  down  that  jump  into  a  large  number  of  small  jumps. 
If  this  were  a  set  of  mathematical  axioms  and  this  were  a  theorem 
or  conclusion  that  you  were  trying  to  prove,  it  might  be  too  much 
for  me  to  try  to  prove  this  thing  in  one  fell  swoopo    But  perhaps 
I  can  visualize  a  number  of  subsidiary  theorems  or  propositions 
such  that  if  I  could  prove  those,  in  turn  I  would  eventually  arrive 
at  this  solution.    In  other  words,  I  set  up  some  path  through  this 
domain  with  a  set  of  subsidiary  solutions,  1,  2,  3»  4,  and  so  on, 
and  attempt  to  prove  this  on  the  basis  of  that  and  then  this  on  the 
basis  of  these  which  I  have  proved  until  eventually  I  arrive  at  the 
path  S.    Many  proofs  in  mathematics  have  been  actually  found  by 
extremely  roundabout  processes.    A  man  starts  to  prove  this  theorem 
and  he  finds  that  he  wanders  all  over  the  map.    He  starts  off  and 
proves  a  good  many  results  which  don't  seem  to  be  leading  anywhere 
and  then  eventually  ends  up  by  the  back  door  on  the  solution  of  the 
given  problem}  and  very  often  when  that's  done,  when  you've  found 
your  solution,  it  may  be  very  easy  to  simplify;  that  is,  to  see  at 
one  stage  that  you  may  have  short-cutted  across  here  and  you  could 
see  that  you  might  have  short-cutted  across  there.    The  same  thing 
is  true  in  design  work.    If  you  can  design  a  way  of  doing  something 
which  is  obviously  clumsy  and  cumbersome,  uses  too  much  equipment; 
but  after  you've  really  got  something  you  can  get  a  grip  on,  some- 
thing you  can  hang  on  to,  you  can  start  cutting  out  components  and 
seeing  some  parts  were  really  superfluous.    Tou  really  didn't  need 
them  in  the  first  place. 
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Now  one  other  thing  I  would  like  to  bring  out  which  I 
run  across  quite  frequently  in  mathematical  work  is  the  idea  of 
inversion  of  the  problem.    You  are  trying  to  obtain  the  solution 
S  on  the  basis  of  the  premises  P  and  then  you  can»t  do  it.  Well, 
turn  the  problem  over  supposing  that  S  were  the  given  proposition, 
the  given  axioms,  or  the  given  numbers  in  the  problem  and  what  you 
are  trying  to  obtain  is  P.     Just  imagine  that  that  were  the  case. 

i 

Then  you  will  find  that  it  is  relatively  easy  to  solve  the  problem 
in  that  direction.    Tou  find  a  fairly  direct  route.    If  so,  it's 
often  possible  to  invert  it  in  small  batches.    In  other  words,  you've 
got  a  path  marked  out  here  -  there  you  got  relays  you  sent  this  way. 
You  can  see  how  to  invert  these  things  in  small  stages  and  perhaps 
three  or  four  only  difficult  steps  in  the  proof. 

Now  I  think  the  same  thing  can  happen  in  design  work. 
Sometimes  I  have  had  the  experience  of  designing  computing  machines 
of  various  sorts  in  which  I  wanted  to  compute  certain  numbers  out  of 
certain  given  quantities.    This  happened  to  be  a  machine  that  played 
the  game  of  nim  and  it  turned  out  that  it  seemed  to  be  quite  diffi- 
cult.   It  took  quite  a  number  of  relays  to  do  this  particular  calcu- 
lation although  it  could  be  done.    But  then  I  got  the  idea  that  if 
I  inverted  the  problem,  it  would  have  been  very  easy  to  do  -  if  the 
given  and  required  results  had  been  interchanged;  and  that  idea  led 
to  a  way  of  doing  it  which  was  far  simpler  than  the  first  design. 
The  way  of  doing  it  was  doing  it  by  feedback;  that  is,  you  start  with 
the  required  result  and  run  it  back  until  -  run  it  through  its  value 
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until  it  matches  the  given  input.    So  the  machine  itself  was  worked 
backward  putting  range  S  over  the  numbers  until  it  had  the  number 
that  you  actually  had  and,  at  that  point,  until  it  reached  the  num- 
ber such  that  P  shows  you  the  correct  way.    Well,  now  the  solution 
for  this  philosophy  which  is  probably  very  boring  to  most  of  you. 
I*d  like  now  to  show  you  this  machine  which  I  brought  along  and  go 
into  one  or  two  of  the  problems  which  were  connected  with  the  design 
of  that  because  I  think  they  illustrate  some  of  these  things  I've  been 
talking  about. 

In  order  to  see  this,  you1 11  have  to  come  up  around  it;  so, 
I  wonder  whether  you  will  all  come  up  around  the  table  now. 
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AESTRACT 

This  memorandum  describes  a  machine  (made  of 
relays,  selector  switches,  gas  diodes,  and  germanium  diodes) 
for  analyzing  several  properties  of  any  combinational  relay 
circuit  which  uses  four  relays  or  fewer. 

This  machine,  called  the  relay  circuit  analyzer, 
contains  an  array  of  switches  on  which  the  specifications 
that  the  circuit  is  expected  to  satisfy  can  be  indicated,  as 
well  as  a  plugboard  on  which  the  relay  circuit  to  be  analyzed 
can  be  set  up. 

The  analyzer  can  (l)  verify  whether  the  circuit 
satisfies  the  specifications,  (2)  make  certain  kinds  of 
attempts  to  reduce  the  number  of  contacts  used,  and  also 
UJ  perform  rigorous  mathematical  proofs  which  give  lower 
bounds  for  the  numbers  and  types  of  contacts  required  to 
satisfy  given  specifications. 
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1.  Introduction 

Some  operations  which  assist  in  the  design  of  relay 
circuits  or  other  types  of  switching  circuits  can  be  described 
in  very  simple  form,  and  machines  can  be  constructed  which  per- 
form them  more  quickly  and  more  accurately  than  a  human  being 
can.    It  seems  possible  that  machines  of  this  type  will  be  use- 
ful to  those  whose  work  involves  the  design  of  such  circuits. 
This  is  the  first  of  two  memoranda  describing  particular  mach- 
ines of  this  kind  which  have  been  built. 

The  present  machine,  called  the  relay  circuit 
analyzer,  is  intended  for  use  in  connection  with  the  design  of 
two  terminal  circuits  made  up  of  contacts  on  at  most  four  relays 

The  principles  upon  which  this  machine  are  based  are 
not  limited  to  two  terminal  networks  or  to  four  relays,  although 
an  enlarged  machine  would  require  more  time  to  operate.  Each 
addition  of  one  relay  to  the  circuits  considered  would  approxi- 
mately double  the  size  of  the  machine  and  quadruple  the  length 
of  time  required  for  its  operation. 

;  This  type  of  machine  is  not  applicable  to  sequential 
circuits,  however,  so  it  will  be  of  use  only  in  connection  with 
parts  of  the  relay  circuits  which  contain  contacts,  but  no  relay 

C011S a 

2.    Operation  of  the  Machine 

The  machine,  as  can  be  seen  from  Photograph  196492, 
contains  sixteen  3-position  switches,  which  are  used  to  specify 
the  requirements  of  the  circuit.    One  switch  corresponds  to  each 
of  the  2^*16  states  in  which  the  four  relays  can  be  put.  Switch 
No.  2  in  the  upper  righthand  corner,  for  instance,  is  labeled 
W  +  X  +  Y»  +  Z,  which  corresponds  to  the  state  of  the  circuit 
in  which  the  relays  labeled  W,  X,  and  Z  are  operated,  and  the 
relay  labeled  Y  is  released. 


The  three  positions  of  this  switch  correspond  to  the 
requirements  which  can  be  imposed  on  the  condition  of  the  cir- 
cuit when  the  relays  are  in  the  corresponding  state.    Since  any- 
single  relay  contact  circuit  assumes  only  one  of  two  values 
(open  or  closed),  the  inclusion  of  a  third  value  (doesn't  matter, 
don't  care,  or  vacuous,  as  it  has  been  called  by  various  per- 
sons) merits  some  explanation.    If  the  machine,  of  which  the 
relay  circuit  being  designed  is  to  be  a  part,  only  permits  these 
relays  to  take  on  a  fraction  of  the  2n  combinations  of  which  n 
relays  are  capable,  then  (except  when  considering  what  the  mach- 
ine will  do  in  case  of  relay  failures)  any  circuits  which  agree 
on  the  combinations  actually  assumed  will  be  equivalent  in  their 
properties.    Since  the  class  of  circuits  which  agree  with  what 
is  wanted  just  in  the  necessary  combinations  is  larger  than  the 
class  of  those  which  agree  in  all  combinations,  the  former 
class  can  and  frequently  will  contain  members  using  fewer  con- 
tacts.   Hence  the  switch  corresponding  to  each  state  is  put 
into  the  don't  care  position  if  the  circuit  will  never  assume 
that  state,  or  if  for  any  other  reason  the  behavior  when  in 
that  state  is  immaterial.    The  sixteen  3-position  switches  thus 
permit  the  user  not  only  to  require  the  circuit  under  consid- 
eration to  have  exactly  some  particular  hindrance  function,  but 
also  allow  the  machine  more  freedom  in  the  cases  where  the  cir- 
cuit need  not  be  specified  completely. 

In  order  to  make  a  machine  of  this  type  to  deal 
with  n  relays,  (this  particular  machine  was  made  for  the  case 
n  -  4)  2n  such  switches  would  be  required,  corresponding  to 
the  2n  states  n  relays  can  assume.    In  each  of  these  states 
the  circuit  can  be  either  open  or  closed,  so  there  are  22*1 
functionally  distinct  circuits.    But  since  each  switch  has 
3  positions,  there  are  32    distinct  circuit  requirements  spec- 
ifiable on  the  switches,  which  in  the  case  n  =  4  amounts  to 
43,046,721.    Thus,  the  number  of  problems  which  the  analyzer 
must  deal  with  is  quite  large,  even  in  the  case  of  only  four 


The  left  half  of  the  front  panel  of  the  machine  (See 
Photograph  No.  196492)  is  a  plugboard  on  which  the  circuit  be- 
ing analyzed  can  be  represented.    There  are  three  transfers 
from  each  of  the  four  relays,  W,  X,  Y,  and  Z  brought  out  to 
jacks  on  this  panel,  and  two  plugs  representing  the  terminals 
of  the  network  are  at  the  top  and  bottom.    Using  these,  as 
well  as  some  patch  cords,  it  is  possible  to  plug  up  any  cir- 
cuit using  at  most  three  transfers  on  each  of  the  four  relays. 
This  number  of  contacts  is  sufficient  to  give  a  circuit  repre- 
senting any  switching  function  of  four  variables. 


nn  +ha  „.   If  the  specifications  for  the  circuit  have  been  put 
on  th«  sixteen  switches,  and  if  the  circuit  has  been  put  on 

oplratef  ^     '  ^  CirCUit  anal^er  is  then  ready  to 

care  ^t^il^  t^6  co^tro1  switch  and  the  evaluate -com- 

pare switch  both  m  the  evaluate  position,  pressing  the  start 
button  will  cause  the  analyzer  to  evaluate  the  circuit  plugged 
Ii^Ia  k*  ?°  indlcate  in  which  of  the  states  the  circuit  is 
closed  by  lighting  up  the  corresponding  indicator  lamps. 

nrtC1..  .  Turning  the  evaluate-compare  switch  to  compare 

^tll°n^lhfuanalyzer  then  checks  whether  the  circuit  dis- 
tfZttJUZ  ?  the  requirements  given  on  the  switches.  A  dis- 
?hl  1    indicated  by  lighting  the  lamp  corresponding  to 

actual  Mr^?UeStion'  -If  t  Switch  is  set  for  cl0^ed  a"d  the 
actual  circuit  is  open  m  that  state,  or  vice  versa    a  dis- 
agreement is  indicated,  but  no  disagreement  is  ever 'registered 

S^SS? eJSdJ&E the  ^ 

to  the  short  test  position  and  the  start  button  is  pressed  again 

clrcSS^d^TdeJenBiBS8  Whether  any  of        contaclfin  this  ' 
sa?iafVin^2ohaVe  ^6en  shorted  out,  with  the  circuit  still 
bestdf7^!  thVe5ulrements.    The  machine  indicates  on  the  lamps 
beside  the  contacts  which  ones  have  this  property. 

ever  need  tht         «!aSUrprising  to  the  reader  than  anyone  would 
rlniVkl    the  assistance  of  a  machine  to  find  a  contact  which 

is  certlin?vrtrue°^  £th?Ut  affe?ting  «»■  circuit,    Wni?e  t£is 
eulf!  ™5r  LSrS  °f  simPle  examples,  in  more  complicated  cir- 
ticSLSv \  f ediJ2dant  elements  are  often  far  . from  obvious,  pa?- 
in  S«  iLif  th6re  Sre  Some  states  for  which  the  switches  are 
in  the  don't  care  position,  since  the  simplified  circuit  mav  be 

onff  f8  °nly  un  tlie  do"  t  care  state.    It  is  often  quite  diffi- 
cult to  see  the  simplification  in  these  cases. 

in„  fln3i      P6  anaiy?!r  is  also  helpful  in  case  the  circuit  be- 
tngi-^-yZ6d  lS  abridP>  because  of  the  complications  involved 
P?^2einf  °Ut  a11  paths ,in  the  bridge'    The^circuit  shown  in 
iJf???M.nSTan/Xampl!  °f  a,circuit  which  was  not  known  to  be 
inefficiently  designed  until  put  on  the  analyzer.    It  determined 
in  less  than  two  minutes  (including  the  time^required  to  pW 
not  S1,0?1!1?*0  the  P^osird)  that  one  of  the  contacts  shown 
can  be  shorted  out.    How  likely  would  a  human  being  be  to  solve 
this  same  problem  in  the  same  length  of  time? 


if 


.        After  the  short  test  has  been  performed,  putting 
the^main  control  switch  in  the  open  test  position  permits  the 
analyzer  to  perform  another  analogous  test,  this  time  open- 
ing the  contacts  one  at  a  time. 

These  two  particular  types  of  circuit  changes  were 
chosen  because  they  are  easy  to  carry  out,  and  whenever  suc- 
cess! ul,  either  one  reduces  the  number  of  contacts  required, 
inere  are  other  types  of  circuit  simplification  which  it  might 
be  desirable  to  have  a  machine  perform,  including  various 
rearrangements  of  the  circuit.    These  would  have  required 
more  time  as  well  as  more  equipment  to  perform,  but  would 
probably  have  caused  the  machine  to  be  more  frequently  suc- 
cessful in  simplifying  the  circuit.    Using  such  techniques, 
it  might  be  possible  to  build  a  machine  which  could  design 
circuits  efficiently  starting  from  basic  principles,  perhaps 
by  starting  with  a  complete  Boolean  expansion  for  the  desired 
function  and  simplifying  it  step  by  step.    Such  a  machine 
would  be  rather  slow  (unless  it  were  built  to  operate  at 
electronic  speeds,  and  perhaps  even  in  this  case),  and  not 
enough  planning  has  been  done  to  know  whether  such  a  machine 
is  practically  feasible,  but  the  fact  that  such  a  machine  is 
theoretically  possible  is  certainly  of  interest,  whether  any- 
one builds  one  or  not. 

Another  question  of  theoretical  interest  is  whether 
a  logical  machine  could  be  built  which  could  design  an  im- 
proved version  of  itself,  or  perhaps  build  some  machine  whose 
over-all  purpose  was  more  complicated  than  its  own.  There 
seems  to  be  no  logical  contradiction  involved  in  such  a  mach- 
ine, although  it  will  require  great  advances  in  the  general 

undertakenaUt°mata  before  any  such  ProJ*ect  °ould  ^  confidently 

• 

To  return  to  the  relay  circuit  analyzer,  a  final 
operation  which  it  performs  is  done  with  the  main  control 
switch  in  the  prove  position.    Pressing  the  start  button  and 
moving  the  other  4-position  switch  successively  through  the 
W,  X.  Y,  and  Z  positions,  then  certain  of  the  eight  lamps 
W,  W[ ,  X,  X',  Y,  I*-,  Z,  Z«  will  light  up.    The  analyzer  has 
carried  out  a  proof  as  to  which  kinds  of  contacts  are  required 
to  synthesize  the  function  using  the  method  of  reduction  to 
functions  of  one  variable,  which  will  be  explained  in  a  forth- 
coming memorandum.    The  analyzer  here  ignores  whatever  circuit 
has  been  plugged  in  the  plugboard,  and  considers  only  the  func- 
tion specified  by  the  sixteen  3-position  switches.    If  every 
circuit  which  satisfies  these  specifications  requires  a  back 
contact  on  the  W  relay,  the  W»  light  will  go  on,  etc. 
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If,  for  instance,  seven  of  the  eight  lights  are  on, 
any  circuit  for  the  function  requires  at  least  seven  contacts, 
and  if  there  is  in  fact  a  circuit  which  uses  just  seven,  the 
machine  has,  in  effect,  given  a  complete  proof  that  this  cir- 
cuit is  minimal.    Circuits  for  which  the  machine  can  give  such 
a  complete  proof  are  fairly  common,  although  there  are  also 
circuits  (which  can  be  shown  to  be  minimal  by  more  subtle  me- 
thods of  proof)  which  this  machine  could  not  prove  minimal. 
An  example  is  the  circuit  of  Figure  1.    This  can  be  simpli- 
fied by  the  analyzer  to  a  circuit  of  nine  contacts,  but  in 
the  prove  position  the  analyzer  merely  indicates  that  at  least 
eight  contacts  are  necessary.    It  can  be  shown  by  other  meth-i 
ods  that  the  9-contact  circuit  is  minimal.    But  at  any  rate, 
the  analyzer  always  gives  a  mathematically  rigorous  lower 
bound  for  the  number  of  contacts. 

3»    The  Circuit  and  Operation  of  the  Relay  Circuit  Analyzer 

A  complete  circuit  diagram  of  the  analyzer  is  shown 
in  Figures  2  and  3.    The  circuit,  as  already  mentioned,  has 
five  modes  of  operation;    1.    evaluating  a  circuit,     2.  com- 
paring a  circuit  with  desired  characteristics,    3.  examining 
a  circuit  for  contacts  that  can  be  shorted  without  affecting 
operation,    4.    examining  for  contacts  that  can  be  opened  with- 
out affecting  operation,  and    5.    proving  that  certain  con- 
tacts are  necessary  in  any  realization  of  the  function.  The 
method  of  operation  of  the  circuit  will  be  described  in  turn 
for  each  of  these  five  modes  of  behavior. 

4.    Evaluation  of  a  Circuit 

• 

In  this  mode  of  operation  the  machine  goes  through 
in  sequence  the  sixteen  possible  states  of  the  relays  W,  X,  Y 
and  Z,  that  are  involved  in  the  circuit  and  tests  in  each  state 
whether  or  not  the  circuit  is  closed.    If  it  is  closed,  the 
corresponding  panel  light  is  lit.    In  this  process  only  the 
right-hand  part  of  the  circuit  in  Figure  2  is  involved  and 
switches  SIS  and  S19  are  both  in  the  evaluate  position.  The 
selector  switch  S17  goes  through  one  complete  revolution  to 
make  this  test.    During  this  revolution  the  four  relays  W,  X, 
Y,  and  Z  proceed  sequentially  through  their  sixteen  states. 
This  sequence  is  produced  by  the  first  two  wipers  and  decks 
of  the  selector  switch  S17.    At  the  first  position  (0000) 
all  four  relays  are  unoperated.    At  the  second  step  (0001), 
ground  on  the  second  wiper  operates  relay  Z,  which  locks  in 
on  its  own  front  contact.     The  circuit  is  then  set  to  test 
the  situation  where  W,  X  and  Y  are  unoperated  and  Z  is  oper- 
ated.   At  the  third  step  relay  Y  is  operated  and  locks  in  on 
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its  own  front  contact.    At  the  fourth  step  Z  is  short-circuited 
by  the  wiper  of  the  first  deck.    This  releases  Z  and  produces 
the  state  0010.    Proceeding  in  this  manner  it  will  be  seen  that 
the  four  relays  W,  X,  Y  and  Z  go  through  the  sixteen  states 
indicated.    The  circuit  which  is  being  tested  may  be  thought 
of  as  being  connected  between  plugs  PI  and  P2  at  the  upper 
left  of  the  diagram.    This  network  consists  of  contacts  on 
the  four  relays  W,  X,  Y  and  Z.    Actually  some  other  contacts 
are  involved  in  the  network  between  PI  and  P2  (contacts  on 
the  H  relays)  but  in  the  present  mode  of  operation  these  H 
relays  do  not  operate  and  do  not  affect  the  hindrance  from 
PI  to  P2.    For  a  given  state  of  the  relays  W,  X,  Y  and  Z  the 
plugs  PI  and  P2  will  be  connected  together  if,  and  only  if, 
the  circuit  being  tested  is  closed  for  that  state  of  the  re- 
lays.   The  relay  G  will,  therefore,  operate  if,  and  only  if, 
the  circuit  is  closed  in  the  state  in  question.    If  it  is 
closed,  a  ground  will  be  applied  to  the  third  wiper  of  the 
selector  switch  S17  and  this  will  fire  the  corresponding 
neon  lamp.    If  it  is  not  closed  +34  volts  will  be  applied 
to  the  lamp  extinguishing  it  (if  it  is  already  fired).  The 
voltage  across  the  lamp  circuit,  64-24  or  about  60  volts, 
lies  between  the  fire  and  sustain  voltages  for  the  neon 
lamps.  Consequently,  if  they  are  fired  they  will  remain 
fired,  if  extinguished  they  will  remain  out.    Thus  the  lamps 
remain  in  the  state  produced  by  the  evaluation  of  the  cir- 
cuit even  after  the  wiper  has  left  the  point  in  question. 

The  movement  of  the  stepping  switch  is  produced  by 
a  three-stage  buzzer  circuit  consisting  of  relays  U,  V  and  P. 
In  the  buzzing  condition  the  parallel  S»  and  T»  combination 
in  series  with  U  will  be  closed.    The  operation  of  U  ener- 
gizes V  through  the  front  U  contact  in  series  with  the  V 
coil.    The  operation  of  V  then  operates  P  in  a  similar  manner. 
The  operation  of  P  releases  U  through  the  P'  contact.  This 
releases  V  which  releases  P.  etc. 

At  the  start  of  an  evaluation,  switch  SIS  will  be 
in  the  evaluate  position,  switch  S19  in  the  evaluate  position, 
selector  switch  S17  at  position  22  (and  relay  S,  therefore, 
operated)  and  selector  switch  S16  at  position  21  (with  relay  T, 
therefore,  operated).    When  the  starting  push  button  S20  is 
pressed  magnet  Ml  of  stepping  switch  1  is  energized.  When 
S20  is  released  Ml  releases  and  the  stepping  switch  moves  to 
position  one.    This  releases  relay  S  and  the  three-stage 
buzzer  U,  V,  P  starts  operating.    At  each  cycle  of  this  buz- 
zer the  coil  of  selector  switch  S17  is  energized  and  released 
by  a  make  contact  on  the  P  relay.    This  sequences  the  relays 
W,  X,  Y  and  Z  through  their  sixteen  states  ,   as  already  des- 
cribed, and  indicates  on  the  neon  lamps  the  states  for  which 
the  circuit  being  tested  is  closed.    When  the  wipers  reach 
level  22  relay  S  operates,  stopping  the  buzzer  and  ending  the 
test. 
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5 .     The  Comparison  Mode  of  Operation 

In  this  mode  of  operation  the  circuit  set  up  on  the 
plugboard  is  to  be  compared  with  the  settings  of  the  sixteen 
three-position  switches.     If  in  any  state  the  circuit  disagrees 
with  the  switch  setting  the  corresponding  neon  lamp  will  light 
up.    For  this  test  switch  S18  is  set  in  the  evaluate  position 
and  switch  S19  in  the  compare  position.    When  the  starting  push 
button  S20  is  pressed,  the  buzzing  circuit  U,  V,  P  starts  as 
before,  cycling  the  selector  switch  S17  through  one  complete 
revolution.    The  four  relays,  as  before,  go  through  their  six- 
teen possible  states  and  the  relay  G,  as  before .operates  or 
not,  depending  on  whether  the  circuit  being  tested  is  closed 
or  not.    The  lamps, however,  are  no  longer  controlled  directly 
by  the  relay  G,  but  instead  by  contacts  on  the  relay  A.  The 
relay  A  is  connected  to  operate,  if,  and  only  if,  the  circuit 
condition  of  the  network  being  tested  (open  or  closed)  dis- 
agrees with  the  setting  of  the  corresponding  three-position 
switch.    This  result  is  obtained  by  having  one  end  of  the  coil 
2f,,?elay  A  connected  (via  the  fourth  wiper  of  selector  switch 
S17J  to  +24  volts,  nothing  (i.e.  floating)  or  minus,  according 
to  the  desired  behavior  of  the  circuit  in  the  state  in  question 
is  open, "don't  care",  or  closed  (as  represented  by  the  setting 
of  the  three-position  switch).    The  other  end  of  the  relay  A 
is  connected  to  +24  volts  or  minus,  according  as  the  actual 
circuit  under  test  is  open  or  closed  (this  being  carried  out 
by  a  transfer  on  the  G  relay).    The  relay  A  will  operate  only 
if  the  two  ends  of  the  coil  receive  different  polarities,  and 
this  will  occur  only  if  the  switch  setting  differs  from  the 
state  of  the  network  under  test  as  indicated  by  the  state  of 
the  relay  G.    If  such  a  disagreement  occurs  the  corresponding 
lamp  is  fired  by  a  ground  coming  in  the  third  wiper  of  selec- 
tor switch  S17. 

The  starting  and  stopping  are  carried  out  by  the 
same  means  as  used  in  the  evaluate  mode. 

6.    The  Short  Test 

In  testing  for  contacts  in  the  circuit  that  can  be 
shorted,  the  sequencing  is  somewhat  more  involved.  Roughly 
speaking,  the  various  contacts  used  in  the  circuit  are  short- 
circuited  one-by-one,  and  for  each  contact  the  circuit  goes 
through  a  sequence  similar  to  the  comparing  mode  of  behavior 
just  described  (comparing  the  circuit  when  this  contact  is 
shorted  with  the  desired  characteristics  set  up  on  the  three- 
position  switches).    If  any  disagreement  is  found,  the  neon 
lamp  associated  with  the  contact  in  .question  is  fired,  indi- 
cating that  this  contact  is  necessary  in  the  circuit  and  cannot 
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be  shorted.    Actually,  the  sequence  is  a  bit  more  complicated 
since  to  save  time  and  equipment  the  tests  on  the  make  and 
break  parts  of  a  transfer  in  the  circuit  being  tested  are 
interleaved. 

To  carry  out  the  short  test  switch  S16  is  put  in 
the  short  position  (the  position  of  S19  is  irrelevant).  The 
selector  switches  S16  and  S17  start  in  positions  21  and  22 
respectively,  so  that  relays  3  and  T  are  both  operated.  When 
the  starting  button  S20  is  pressed,  the  magnets  of  both  S16 
and  S17  are  energized  and  when  S20  is  released  they  step 
ahead  one  step  releasing  both  S  and  T  and  allowing  the  buzzer 
circuit  to  start.    The  first  step  of  selector  switch  S16 
causes  E  to  operate.    This  removes  the  voltage  from  the  in- 
dicating lamps  L16  to  L39  (removing  any  indication  on  these 
lamps  from  previous  runs).    Stepper  1  then  proceeds  through 
a  complete  revolution.    At  step  17  the  second  wiper  applies 
a  voltage  to  the  coil  of  Sl6,  pulsing  S16  ahead  one  notch. 
This  releases  E,  and  reapplies  voltage  to  the  indicating 
lamps  Lib  to  L39.    The  wipers  of  selector  switch  S16  are  now 
connected  to  position  1  (the  top  row)  of  this  selector.  The 
sixth  wiper  operates  relay  HI  which  disconnects  the  first  W 
transfer  from  the  circuit  being  tested.    The  three  points  in 
the  circuit  being  tested  that  were  previously  connected  to 
this  transfer  (on  the  W  relay)  are  brought  down  to  points 
rl,  P5  and  P7,  P5  coming  through  the  third  wiper.    The  free 
ends  of  the  W  transfer,  that  are  now  disconnected  from  the 
circuit  being  tested  are  brought  down  via  wipers  2  and  4.  To 
test  whether  either  part  of  this  transfer  can  be  shorted,  the 
selector  switch  S17  goes  through  a  complete  cycle,  putting 
the  relays  W,  X,  Y  and  Z  in  each  possible  state  as  in  prev- 
ious modes  of  operation.    In  each  state,  the  first  test  is 
to  short  P3  to  P5,  which  in  effect  shorts  the  nodes  of  the 
circuit  normally  connected  to  the  W  part  of  the  contact,  and 
the  circuit  state  is  compared  with  the  desired  specification 
on  the  three-position  switch,    A  disagreement  operates  relay 
A  which,  by  way  of  wiper  1,  fires  the  lamp  corresponding  to 
the  W  contact.    This  shorting  of  the  nodes  occurs  in  the  buz- 
zer cycle  during  the  period  when  the  relay  U  is  operated. 
The  A  contact  is  connected  to  the  corresponding  lamp  through 
contact  V  and  P'  in  series.    This  gives  relay  A  time  to  oper- 
ate (or  release  from  a  previous  operation)  before  its  reading 
is  applied  to  the  lamp,  and  also  disconnects  the  lamp  before 
the  state  of  A  is  changed  by  the  next  operation. 


The  second  test  in  the  same  buzzing  cycle  is  to 
short  the  break  contact  of  the  transfer.    This  occurs  when  U 
releases,  connecting  P3  to  P4  and  P5  to  P7.    The  W  make  is 
then  connected  as  usual  in  the  circuit  being  tested  (via  the 


Hx  make,  U»    and  wiper  2)  and  the  nodes  previously  connected 
to  the  back  W»  contact  are  shorted  via  the  3rd  wiper  of  sel- 
ector switch  S16.    In  this  part  of  the  buzzing  cycle  the  dis- 
agreement relay  is  connected  via  P  and  V»  contacts  (for  timing 
margins  similar  to  P»  and  V  before)  and  the  5th  wiper,  to  the 
lamp  corresponding  to  the  W'     or  break  contact.    This  lamp 
will  fire,  as  before,  if  a  disagreement  occurs  indicating  that 
the  contact  is  necessary. 


After  selector  switch  S17  has  run  through  all  states 
{ rows  1  to  16)  it  applies  ground  through  wiper  2  to  the  magnet 
of  selector  switch  S16,  advancing  it  one  step.    The  machine 
now  applies  the  shorting  test  to  the  X  and  X»  contacts  connected 
to  the  second  row  of  selector  switch  S16.    Proceeding  in  this 
manner  it  tests  all  the  contacts.    On  reaching  row  13,  the  6th 
wiper  of  selector  S16  applies  ground  to  its  own  coil  through 
its  own  back  contact.    This  causes  it  to  step  rapidly  through 
the  remaining  positions  until  it  reaches  row  21  where  it  oper- 
ates relay  T.     The  first  selector  switch  is  meanwhile  still 
Deing  pulsed  by  the  buzzer  circuit.    After  T  operates,  the 
first  time  S17  reaches  row  22,  relay  S  operates  and  the  buz- 
zer stops.    This  completes  the  test. 

i  <-  ?f       is  desired  to  hurry  the  machine  through  the 

latter  part  of  a  test  (for  example  if  only  a  few  of  the  avail- 
able contacts  are  being  used  and  these  are  near  the  top)  the 
reset  button  S21  can  be  pressed.    This  causes  S16  to  run 
rapidly  to  the  stop  position  (row  21). 

7.     The  Open  Test 

The  test  for  opening  contacts  proceeds  exactly  as 
the  short  test  just  described,  except  that  having  switch  SIS 
in  the  open  position  opens  wiper  3  of  S16.    This  opens  the 
short  that  was  applied  in  the  previous  test  to  the  nodes 
normally  connected  to  the  contact  being  tested.    The  relay 
therefore  indicates  the  behavior  of  the  circuits  when  the 
different  contacts  are  opened. 

The  "Prove"  Mode  of  Operation 

When  switch  SIS  is  set  in  the  "prove"  position 
the  machine  indicates,  by  lighting  some  of  the  lamps  L40  to 

that  certain  contacts  are  necessary  in  any  circuit  which 
realizes  the  switching  function  set  up  on  the  sixteen  three- 
position  switches.    This  indication  is  obtained  by  moving 
switch  S22  through  its  four  possible  positions.    In  the  W 
position  the  machine  tests  whether  W  and/or  W  contacts  are 
necessary  and  if  so,  lights  the  corresponding  lamps  etc. 
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The  method  of  operation  is  based  on  the  following 
result  in  switching  theory  (stated  for  simplicity  for  the  case 
of  four  variables).    At  least  one  W  (make)  contact  is  necess- 
ary in  any  realization  of  a  given  switching  function  if  there 
are  one  or  more  states  of  the  other  relays  (X,  Y,  and  Z)  such 
that  when  the  X,  Y  and  Z  relays  are  in  such  a  state,  changing 
the  W  relay  from  unoperated  to  operated  changes  the  function 
from  open  to  closed.    At  least  one  W   (break)  contact  is  nec- 
essary if  there  exists  a  state  of  the  X,  Y  and  Z  relays  such 
that  when  they  are  in  this  state,  operating  the  W  relay  changes 
the  circuit  from  closed  to  open.    These  are  both  obvious,  since 
the  only  way  by  which  operating  the  W  relay  alone  could  close  a 
previously  open  circuit  is  by  establishing  an  operating  path 
through  a  make  contact  on  the  W  relay,  and  similarly  for  the 
condition  with  a  break  contact. 

The  condition  that  a  W  contact  is  necessary  can 
also  be  thought  of  geometrically  in  the  following  way.  The 
sixteen  states  of  the  four  relays  can  be  thought  of  as  the 
vertices  of  a  four-dimensional  cube.    This  cube  consists  of 
two  three-dimensional  subcubes,  the  first  being  the  eight 
states  of  the  X,  Y,  Z  relays  with  W  not  operated,  and  the 
second,  the  eight  states  of  the  X,  Y,  Z  relays  with  W  opera- 
ted.   If  there  is  any  point  in  the  "W  unoperated"  cube  in 
which  the  circuit  is  open  (closed)  while  being  closed  (open) 
in  the  corresponding  point  of  the  "W  operated"  cube,  at  least 
one  W  (W )  contact  is  necessary. 

The  "Prove"  part  of  the  circuit  can  best  be  under- 
stood in  terms  of  this  geometrical  picture.    A  two-terminal 
network  with  terminals  a  and  b  is  set  up  in  the  machine, 
corresponding  to  this  cubeo      Every  vertex  of  the  cube  for  which 
the  circuit  should  be  closed  is  connected  to  terminal  a;  all 
vertices  for  which  the  circuit  should  be  open  are  connected 
to  terminal  b  ("don't  care"  vertices  are  left  floating).  When 
testing  for  the  necessity  of  W  or  W  contacts,  eight  diodes 
are  connected  between  corresponding  points  of  the  three- 
dimensional  subcubes  mentioned  above.    These  point  from  the 
"W  unoperated"  subcube  to  the  "W  operated"  subcube.  Current 
will  pass  from  terminal  a  to  terminal  b  if  and  only  if  a  W 
contact  is  necessary.    This  is  true  since  this  conduction 
can  take  place  only  by  entering  the  cube  at  a  closed  state 
(these  being  the  only  ones  connected  to  terminal  a),  passing 
through  a  diode  in  the  conducting  direction  (this  requires 
that  the  closed  state  be  in  the  "W  unoperated"  cube)  and  leav- 
ing the  cube  to  terminal  b  at  an  open  state.    Thus  the  con- 
ditions for  conduction  from  a  to  b  are  identical  with  the  con- 
ditions for  necessity  of  a  W  contact.    In  a  similar  manner,  it 
may  be  seen  that  the  network  will  conduct  from  b  to  a  if  and 
only  if  a  W  contact  is  necessary. 
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In  operation,  the  circuit  is  alternately  tested  for 
conduction  in  the  two  directions.     The  alternation  is  obtained 
by  operation  of  the  four-stage  buzzer  previously  described. 
When  P  is  operated,  the  circuit  is  tested  for  conduction  from 
A  to  B.    If  this  condition  occurs,  it  fires  the  corresponding 
neon  lamp  (for  the  w,  X,  Y  or  Z  make  contact).    When  P  is  re- 
leased, voltage  is  applied  to  the  AB  network  in  the  reverse 
direction  and  if  conduction  occurs,  it  fires  the  correspond- 
ing neon  lamp  (for  the  WV,  X',  Y»  or  Z»  break  contact).  These 
lamps  remain  fired  until  released  either  by  turning  off  the 
mam  power  or  flipping  the  "evaluate-compare"  switch  S19  from 
one  position  to  the  other. 

Although  it  has  been  explained  that  the  circuit  for 
doing  these  tests  is  laid  out  in  the  shape  of  a  four-dimensional 
cube,  the  circuit  diagram  of  Figure  3  is  not  drawn  by  the  use 
of  a  direct  projection  of  such  a  cube,  but  is  laid  out  in  a 
Plane  by  a  method  due  to  W.  Keister  (The  Design  of  Switching 
Circuits,  D.  Van  Nostrand,  1951,  p.  174),  which  simplifies  its 
appearance. 

It  can  easilv  be  verified  that  by  putting  switch 
bd2  in  any  one  of  its  four  positions  the  circuit  in  Figure  3 
reduces  to  a  4-dimensional  cube  with  8  diodes  joining  its  two 
halves.    However  the  manner  in  which  these  4  sets  of  &  diodes 
each  were  combined  to  give  a  total  of  only  14,  while  at  the 
same  time  using  only  8  decks  of  the  switch  S22,  may  be  of  in- 
terest.   It  can  be  applied  to  give  similar  economies  in  the 
design  of  analogous  circuits  for  cubes  of  any  dimension.  This 
method  depends  on  some  concepts  due  to  R.  W.  Hamming  (Bell 
System  Technical  Journal,  2£,  pp. 147-160,  April,  1950).  It 
is  possible  to  divide  the  vertices  of  an  n-cube  into  two  mu- 
tually exclusive  and  collectively  exhaustive  classes,  called 
parity  classes,  depending  on  whether  the  number  of  coordinates 
having  the  value  1  is  even  or  odd.    If  a  point  belongs  to  one 
parity  class,  all  of  the  points  which  have  distance  1  from  it 
(and  hence  differ  in  only  one  coordinate  from  it)  are  in  the 
opposite  parity  class.  .This  means  that  every  edge  of  the  cube 
connects  vertices  of  opposite  parity  classes.    Since  in  every 
position  of  S22  the  diodes  are  connected  along  edges  of  the 
cube,  it  means  that  it  is  necessary  to  be  able  to  connect 
diodes  only  between  points  of  opposite  parity  classes. 

Thus  the  diodes  are  all  connected  to  the  points  of 
one  parity  class,  and  the  decks  of  switch  S22  are  connected  to 
the  points  of  the  other  class.    If  one  diode  pointing  toward 
and  one  pointing  away  from  each  point  of  the  even  parity  class 
is  provided,  then  the  switch  contacts  can  connect  each  point  of 
the  other  parity  class  to  the  other  end  of  the  proper  one  of 
these  two  diodes.    In  the  actual  circuit  not  quite  this  many 
diodes  are  used,  since  the  points  0000  and  1111  require  only 
one  of  the  two  diodes. 
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9.    Notes  and  Comments 

The  small  size  and  portability  of  this  machine  depend 
on  the  fact  that  a  mixture  of  relay  and  electronic  circuit  ele- 
ments were  used.    The  gas  diodes  are  particularly  suited  for  use 
where  a  small  memory  element  having  an  associated  visual  display 
is  required,  and  the  relays  and  selector  switches  are  particu- 
larly suited  for  use  where  the  ability  to  sequence  and  inter- 
connect using  only  a  small  weight  and  space  is  required.  In 
all,  the  relay  circuit  analyzer  uses  only  24  relays,  2  selector 
switches,  48  miniature  gas  diodes,  and  14  germanium  diodes  as 
its  logical  elements. 

It  may  be  of  interest  to  those  familiar  with  gen- 
eral purpose  digital  computers  to  compare  this  method  of  solu- 
tion of  this  problem  on  such  a  small,  special-purpose  machine 
with  the  more  conventional  method  of  coding  it  for  solution  on 
a  high-speed  general-purpose  computer.    One  basic  way  in  which 
the  two  methods  differ  is  in  the  directness  with  which  the  cir- 
cuits being  analyzed  are  represented.    On  a  general-purpose 
computer  it  would  be  necessary  to  have  a  symbolic  description 
of  the  circuit,  probably  in  the  form  of  a  numerical  code  des- 
cribing the  interconnections  of  the  circuit  diagram,  and  repre- 
senting the  types  of  contacts  that  occur  in  the  various  parts 
of  the  circuit  by  means  of  a  list  of  numbers  in  successive 
memory  locations  of  the  computer.    On  the  other  hand,  the  relay 
circuit  analyzer  represents  the  circuit  in  a  more  direct  and 
natural  manner,  by  actually  having  a  copy  of  it  plugged  up  on 
the  front  panel. 

This  difference  in  the  directness  of  representation 
has  two  effects.    First,  it  would  be  somewhat  harder  to  use 
the  general-purpose  computer,  because  the  steps  of  translating 
the  circuit  diagram  into  the  coded  description  and  of  typing 
it  onto  the  input  medium  of  the  computer  would  be  more  compli- 
cated and  lengthy  than  the  step  of  plugging  up  a  circuit  dir- 
ectly.   The  second  effect  is  in  the  relative  number  of  logical 
operations  (and  hence,  indirectly,  the  time)  required  by  the 
two  kinds  of  machines.    To  carry  out  the  fundamental  step  in 
this  procedure  of  determining  whether  the  given  circuit  (or 
some  modification  of  it  obtained  by  opening  or  shorting  a 
contact)  is  open  or  closed  for  some  particular  state  of  the 
relays  requires  only  a  single  relay  operate  time  for  the  re- 
lay circuit  analyzer.    However,  the  carrying  out  of  this  fun- 
damental step  on  a  general-purpose  digital  computer  would  re- 
quire going  through  several  kinds  of  subroutines  many  times. 
There  would  be  several  ways  of  coding  the  problem,  but  in  a 
typical  one  of  them  the  computer  would  first  go  through  a 
subroutine  to  determine  whether  a  given  contact  were  open  or 
closed,  repeating  this  once  for  each  contact  in  the  circuit, 
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and  then  would  go  through  another  subroutine  once  for  each 
node  of  the  network.    Altogether  this  would  probably  involve 
the  execution  of  several  hundred  orders  on  the  computer,  al- 
though by  sufficiently  ingenious  coding  this  might  be  cut  down 
to  perhaps  100.    Since  each  order  of  a  computer  takes  perhaps 
100  times  the  duration  of  a  single  logical  operation  (i.e.,  a 
pulse  time,  if  the  computer  is  clock-driven),  it  turns  out  that 
what  takes  1  operation  time  on  one  machine  takes  perhaps  10.000 
on  another. 

Since  10,000  is  approximately  the  ratio  between 
the  speed  of  a  relay  and  of  a  vacuum  tube  in  performing  logical 
operations,  this  gain  of  about  10,000  from  the  directness  of 
the  representation  permits  this  relay  machine  to  be  as  fast  as 
a  general-purpose  electronic  computer. 

This  great  disparity  between  the  speeds  of  a  general- 
purpose  and  of  a  special-purpose  computer  is  not  typical  of 
all  kinds  of  problems,  since  a  typical  problem  in  numerical 
analysis  might  only  permit  of  a  speed-up  by  a  factor  of  10 
on  a  special-purpose  machine  (since  multiplications  and  div- 
isions required  in  the  problem  use  up  perhaps  a  tenth  of  the 
time  of  the  problem) .    However,  it  seems  to  be  typical  of 
combinatorial  problems  that  a  tremendous  gain  in  speed  is 
possible  by  the  use  of  special  rather  than  general-purpose 
digital  computers.    This  means  that  the  general -purpose  mach- 
ines are  not  really  general  in  purpose,  but  are  specialized 
in  such  a  direction  as  to  favor  problems  in  analysis.    It  is 
certainly  true  that  the  so-called  general  purpose  machines 
are  logically  capable  of  solving  such  combinatorial  problems, 
but  their  efficiency  in  such  use  is  definitely  very  low.  The 
problems  involved  in  the  design  of  a  general -purpose  machine 
suitable  for  a  wide  variety  of  combinatorial  problems  seem  to 
be  quite  difficult,  although  certainly  of  great  theoretical 
intere st • 

10.  Conclusion 

An  interesting  feature  of  the  relay  circuit  analy- 
zer is  its  ability  to  deal  directly  with  logical  circuits  in 
terms  of  3-valued  logic.    There  would  be  considerable  interest 
in  techniques  permitting  easy  manipulation  on  paper  with  such 
a  logic,  because  of  its  direct  application  to  the  design  of 
economical  switching  circuits.    Even  though  such  techniques 
have  not  yet  been  developed,  machines  such  as  this  can  be  of 
value  in  connection  with  3-valued  problems. 
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Whether  or  not  this  particular  kind  of  machine 
ever  proves  to  be  useful  in  the  design  of  practical  relay 
circuits,  the  possibility  of  making  machines  which  can  assist 
in  logical  design  procedures  promises  to  be  of  value  to 
everyone  associated  with  the  design  of  switching  circuits. 
Just  as  the  slide  rule  and  present-day  types  of  digital  com- 
puters can  help  perform  part  of  the  routine  work  associated 
with  the  design  of  linear  electrical  networks,  machines  such 
as  this  may  someday  lighten  much  of  the  routine  work  assoc- 
iated with  the  design  of  logical  circuits. 
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FIGURE  I 

THE  RELAY  CIRCUIT  ANALYZER  WAS  ABLE  TO  SIMPLIFY 
THIS  CIRCUIT,  REMOVING  ONE  CONTACT,   IN    LESS  THAN 
TWO  MINUTES  TOTAL  TIME.    CAN  YOU  DO  AS  WELL? 
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The  central  part  of  the  Throbac  circuit  is  a  relay  accum- 
ulator which  can  count  up  to  eighty  in  a  modified  Roman  numeral 
system*    The  accumulator  is  arranged  so  that  it  io  possible  to  add 
or  subtract  I,  V,  X  or  L  to  the  contents  of  the  accumulator.  It 
consists  of  seven  stages  of  U-2  circuits.    The  first  three  stages 
Wl-Zl,  £2-22  and  i'<4-Z4  accumulate  ,fI*sn.    These  stages  are  arranged 
to  count  up  to  four  arid  recycle  to  aero  at  the  fifth  I.  Thus, 
within  these  stages  either  sero,  one,  two,  three  or  four  "1*8"  will 
be  registered.    The  number  of  nI*sH  appears  in  binary'  form  in  the 
three  stages  of  »-Z. 

The  next  \h-Z  coribination  accumulates  "V's",  either 

aero  or  one  V  being  registered  here*    The  final  three  stages 
VX^-Zi^,  WIg-Zig  and  U*^-ZX^  accumulate  ITs:,sn  from  aero  up  to  seven. 

If  the  relay  F  is  operated,  the  accumulator  is  arranged 
to  add;  if  F  is  released,  to  subtract.    Supposing  F  operated, 
closing  Pj  adds  I  to  the  contents  of  the  accumulator.  Closing 
Pv  adds  V,  P1  adds  X  and  PL  add*  U   This  may  be  verified  by  trac- 
ing out  the  circuit  paths  into  the  w-2  circuits  in  the  various 
cases.    For  example,  if  the  accumulator  has  aero  in  it,  all  W»s  and 
2*8  are  released,  and  when  Pj  is  closed  a  ground  passes  through  a 
chain  of  contacts  Pj-F-Z^-F  to  pulse  the  WX-Z1  pair,  and  this  Is 
the  only  W-Z  pair  to  receive  a  ground.    If,  instead,  PL  had  been 
pulsed,  the  fcfcj-ZJ^  pair  and  the  SS^-ZX^  pair  would  both  receive 
ground,  thus  registering  L  (Sill  ♦  X),    A  study  of  the  circuit 
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vd.ll  show  that  In  all  cases  it  adds  or  subtracts  (according  to 
the  state  of  F)  I,  V,  X  or  L  when  Pj,  P^g  Px  or  P^  is  operated. 

At  the  bottom  of  this  circuit  a  connection  leads  out  to 
control  the  C  relay*    This  connection  will  be  seen  to  carry  & 
{.-round  when  a  number  is  added  to  the  accumulator  vrhich  causes  it 
to  overrun  its  limit  either  by  addition,  giving  a  number  greater 
than  seventy-nine,  or,  by  subtraction,  a  number  less  than  zero. 
In  these  cases  the  carrying  to  or  borrowing  from  f&utt  would  be 
the  next  column  goes  out  on  the  lead  in  question  to  control  the 
0  relay.    This  relay,  to  be  described  later,  indicates  the  end 
of  a  division. 

The  number  registered  in  the  accumulator  is  displayed 
on  the  panel  by  means  of  a  series  of  thirteen  lights.  These 
lights  are  controlled  by  contact  networks  on  the  W-Z  relays  of 
the  accumulator.    The  contact  networks  translate  from  the  modified 
Roman  numeral  notation  to  the  standard  one.    The  part  of  the  number 
which  is  a  multiple  of  ten  appears  in  the  three  left  columns  of 
lights*  1*7  or  X7,       or  X6,  L$  or  X$.    The  part  of  the  number 
registered  which  is  less  than  ten  appears  in  the  four  right 
columns  of  lights. 

As  an  example,  suppose  the  number  registered  is  LXXV 
(64) •    In  the  accumulator  the  W-Z  pairs  W4-Z4  (HID,  UX^-ZX^  and 
WX2-ZX2  (XXXXXX)  will  be  operated  and  other  W-Z  pairs  released. 
In  the  accumulator  light  circuit  it  will  be  found  that  lights 
L©,        I4  and      will  receive  a  ground  and  be  illuminated,  dis- 
playing the  number  IXIV, 


The  sequencing  for  adding  or  subtracting  a  number  entered 
in  the  keyboard  into  the  accumulator  is  carried  out  chiefly  by 
stepping  switch  A,    For  such  an  addition  or  subtraction,  this 
stepper  sweeps  across  the  keyboard,  starting  from  the  right-hand 
column  and  sequentially  adding  or  subtracting  the  numbers  registered 
ftn  each  column.    The  addition  sequence  is  started  by  pressing  the 
ADD  button  which  causes  P  to  operate  and  lock  in  through  a  back 
contact  on  £•    The  operation  of  P  causes  the  bus a or  relay  8  to  start 
operating  and  releasing  at  about  ten  cycles  per  second.    Whan  3 
closes  it  pulses  the  stepping  coil  of  stepper  A,  novin^  it  ahead  one 
notch.    The  release  of  D  puts  a  ground  on  the  wipers  of  the  stepper 
and,  therefore,  on  the  first  vertical  connection  through  the  key- 
board switches*    Let  us  suppose  that  the  number  -IX VI  is  entered  in 
the  keyboard  In  the  four  right-hand  columns*    I  is  then  registered 
in  the  right  most  colum  and  the  ground  from  the  stepper  passes 
through  this  I  push  button  to  operate  the  Pj  relay*    The  F  relay 
has  been  operated  by  P  and  therefore  X  is  added  to  the  previous 
contents  of  the  accumulator*    On  the  next  cycle  of  the  busser.  the 
stepper  moves  to  the  next  column  and  operates  the  Py  relay  which 
adds  V  into  the  accumulator*    Py  also  causes  E  to  operate  and 
lock  in  through  t%    The  purpose  of  this  is  to  cause  any  further  I*s 
to  be  subtracted  rather  than  added.   On  the  next  cycle  of  the 
buzser,  ground  is  applied  to  the  third  vertical  of  the  keyboard 
and,  because  of  the  t  entered  there,  operates  the  PL  relay*  This 
adds  L  to  the  accumulator  and  also  operates  the  S  relay,  which  also 


locks  in  through         The  operation  of  5  signif  ies  that  an  L  has 
occurred  and  consequently  any  X'b  or  V«s  now  encountered  on  the 
keyboard  oust  be  subtracted.    On  the  next  cycle  of  the  buzzer,  the 
fourth  vertical  receives  ground  and  because  of  the  X  in  this  column, 
pi  operates.    Since  S  is  closed,  the  relay  H  also  operates,  releas- 
ing F  and  isaking  the  accumulator  subtract  instead  of  add.  The 
tiding  of  these  relays  is  adjusted  so  that  F  releases  before  the 
p£  pulse  could  add  into  the  accumulator.    X  is  therefore  subtracted. 
On  the  next  three  cycles  of  the  buzzer,  no  further  numbers  are  en- 
countered and  the  accumulator  does  not  change.    On  the  eii^ta 
cycle,  the  wipers  pass  a  ground  to  the  K  relay  which  locks  in 
axsmentarily,  and  also  to  the  reset  coil  of  the  stepper.    The  opera- 
tion of  K  releases  relays  P,  &  and  s  and  also  disconnects  the  buzzer 
and  the  wipers.    The  reset  coil  allows  the  wipers  to  return  to  their 
normal  position  and  since  they  have  been  disconnected  by  K  they  have 
no  effect  as  they  pass  over  the  keyboard  colunns.    When  the  wipers 
reach  their  nornal  position  they  open  the  off-normal  switch  of  the 
stepper.    This  releases  K  and  the  addition  operation  is  complete. 

The  process  of  subtraction  is  essentially  the  sane. 
Pressing  the  subtract  button  causes  M  to  operate  and  lock  up,  which 
starts  the  buzzer  and  the  stepping  operations.    In  this  case, 
however,  F  Is  normally  released,  so  that  numbers  encountered  in 
the  keyboard  are  normally  subtracted*    However,  when  a  smaller 
number  is  encountered  after  a  larger  one  the  relay  F  will  operate, 
causing  It  to  be  added. 


Sfciltiplication  Is  obtained  by  successive  addition.  If 
the  m  button  is  pressed,  the  machine  adds  the  contents  of  the  key* 
board  into  the  accumulator  V  tines,  if  the  M  button  is  pressed 
X  tines.    This  counting  is  controlled  by  stepper  B.    If  the  m 
button  is  pressed,  the  keyboard  contents  ere  added  or  subtracted 
depending  on  whether  the  Wt  or      buttons  have  been  previously 
operated* 

Suppose  VIII  is  to  be  multiplied  by  IV.    VIII  is  entered 
in  the  keyboard  and  first  the  MV  and  then  the  11.  push  buttons 
pressed.    When  the  m  button  is  pressed,  relay  ffl  operates  and 
locks  in  through  Qt.    The  relay  T  also  operates,  locking  in  through 
the  Clear  Upper  key.    The  relay  T  signifies  that  I's  occurring  later 
in  the  multiplier  must  be  interpreted  as  negative.    The  operation 
of  KV  causes  the  P  relay  to  operate  and  start  an  addition  operation* 
When  stepper  A  reaches  the  eighth  point,  K  operates  causing  the  step* 
ping  coil  of  stepper  B  to  receive  a  ground  {through  the  MV  make) . 
fcfoen  stepper  A  resets  to  normal,  P  again  operates,  again  adding  the 
keyboard  contents  into  the  accumulator  and  advancing  stepper  B  at 
the  end  of  the  addition*    This  process  continues  until  stepper  B 
reaches  Its  fifth  point*   There  the  ground  on  the  wipers  operates 
relay  Q  which  releases  MV  and  stops  the  series  of  additions. 
Q  locks  in  and  applies  ground  to  the  reset  coil  of  stepper  B,  return, 
ing  it  to  normal*    When  it  reaches  normal,  the  off -normal  contacts 
are  opened  and  Q  is  released. 


Next  the  ia  button  is  pressed*    Since  T  is  in  (due  to 
the  previous  operation  of  117) ,  this  causes  H  to  operate  and  the 
machine  subtracts  the  keyboard  contents  from  the  accumulator.  This 
c  ample  tee  the  multiplication.    The  ML  button  produces  a  sequence 
similar  to  the  MV  button ,  except  that  stepper  £  crust  go  to  the  tenth 
point  instead  of  the  fifth  to  operate  Q  and  stop  the  series  of 
additions. 

If  another  multiplication  is  to  be  performed,  the  Clear 
Upper  button  should  be  pressed.    This  releases  T  and  resets  stepper 
B  to  normal  if  for  some  reason  it  is  not  already  there. 

Division  is  performed  by  successive  subtraction.  The 
dividend  is  entered  in  the  accumulator  and  the  divisor  in  the  key- 
board.   When  the  divide  button  is  pressed,  relay  E  operates  and 
locks  in  through  P*  or  K*.    C  is  normally  out  and  E,  therefore, 
causes  M  to  operate  and  lock  in,  starting  a  subtraction.   If,  during 
this  subtraction,  the  accural  la  tor  does  not  run  through  aero,  C  will 
not  operate  and  another  subtraction  will  occur  since  U  will  again 
operate  as  soon  as  £  releases.   At  each  subtraction  of  this  sort 
the  operation  of  &  at  the  end  of  the  subtraction  energises  the 
stepping  coil  of  stepper  B  advancing  it  one  step.    Eventually  in 
this  subtraction  process  the  contents  of  the  accumulator  will  go 
negative.    This  causes  C  to  operate  and  indicates  that  one  too 
many  subtractions  have  been  performed.    The  last  subtraction  is 
not  counted  on  stepper  B  since  its  operating  path  passes  through  C«. 
The  operation  of  C  causes  the  next  operation  to  be  an  addition,  since 
the  next  ground  when  S  releases  is  placed  on  P  rather  than  M.  The 


machine  therefore  goes  through  one  addition  sequence  (compensating 
in  the  accumulator  for  the  extra  subtraction)*    At  the  eighth  point 
of  this  sequence  K  operates  and,  since  P  is  operated,  the  hold  on 
£  opens  and  E  releases.    This  stops  any  further  additions  or  sub- 
tractions  and  also  releases  the  C  relay  for  the  next  division. 
The  stepper  B  will  be  at  a  level  equal  to  the  number  of  subtractions 
{not  counting  the  extra  one)  and  Its  position  therefore  is  the 
quotient  desired.    The  value  of  this  quotient  is  indicated  on  the 
quotient  lights  which  are  wired  to  the  contacts  of  the  stepper  in 
such  a  way  as  to  indicate  in  Soman  numerals  the  position  of  the 
wipers.    This  dial  is  cleared  by  pressing  the  Clear  Upper  button 
whltaj  operates  the  reset  coil  of  stepper  B. 

c.  e.  suaekoh 


April  9*  1953 
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TOWER  OF  HANOI 

C.  E.  Shannon 

The  Tower  of  Hanoi  machine  automatically  solves  a  well-known  puzzle 
constructed  as  follows.  There  are  three  pegs  standing  upright  in  a  horizontal  plate. 
On  the  first  peg  are  a  number  of  disks  of  graduated  sizes.  The  problem  is  to  move  all 
these  disks  to  the  third  peg  subject  to  the  rules  that  (1)  only  one  disk  can  be  moved  at 
a  time,  and  (2)  a  disk  can  never  be  placed  on  top  of  a  smaller  disk. 

This  puzzle  has  been  treated  in  the  literature.  It  can  be  readily  proved  by 
induction  that  with  n  disks,  2"-l  moves  are  necessary.  For  suppose  this  formula  is 
true  up  to  n-\.  With  n  disks,  in  order  to  move  the  largest  one  to  the  third  peg  it  is 
necessary  that  all  the  other  disks  be  on  the  second  peg  in  proper  order.  This,  by 
assumption,  requires  2n_1-l  moves.  Moving  the  largest  disk  requires  one  more  and 
moving  the  n-l  disks  from  the  second  to  the  third  peg,  again  by  the  inductive 
hypothesis,  requires  2n_1-l  moves.  Consequently  the  entire  operation  requires  2"-l 
moves.  Since  the  formula  is  true  for  n  =  1,  it  holds  in  general.  The  argument  also 
shows  how  to  build  up  a  solution  for  any  n  from  n-l,  and  hence,  eventually,  from  the 
n  =  1  case. 

For  n  =  6  (the  case  handled  by  the  machine)  the  solution  is  given  by  the  following 
table. 


000000 

000000 

100000 

211111 

000001 

000001 

100001 

211112 

000010 

000021 

100010 

211102 

000011 

000022 

100011 

211100 

000100 

000122 

100100 

211200 

000101 

000120 

100101 

211201 

000110 

000110 

100110 

211221 

000111 

000111 

100111 

211222 

001000 

002111 

101000 

210222 

001001 

002112 

101001 

210220 

001010 

002102 

101010 

210210 

001011 

002100 

101011 

210211 

001100 

002200 

101100 

210011 

001101 

002201 

101101 

210012 

001110 

002221 

101110 

210002 

001111 

002222 

101111 

210000 

010000 

012222 

110000 

220000 

010001 

012220 

110001 

220001 

010010 

012210 

110010 

220021 

010011 

012211 

110011 

220022 

oioioo 

012011 

110100 

220122 

oioioi 

012012 

110101 

220120 

oiono 

012002 

110110 

220110 

OlOlll 

012000 

110111 

220111 

ni  1  AAA 

011000 

/"\  4     H  AAA 

011000 

111000 

222111 

011001 

011001 

111001 

222112 

011010 

011021 

111010 

222102 

011011 

011022 

111011 

222100 

011100 

011122 

111100 

222200 

011101 

011120 

111101 

222201 

011110 

011110 

111110 

222221 

011111 

011111 

111111 

222222 

The  first  column  gives  the  binary  numbers  from  0  to  63.  The  second  column 
describes  the  positions  of  the  disks.  For  example,  000000  means  that  all  disks  are  on 
peg  0.  The  fifth  entry  000122  means  that  the  three  largest  disks  are  on  peg  0,  the  next 
smaller  disk  on  peg  1,  and  the  two  smallest  disks  on  peg  2.  The  numbers  in  the 
second  column  are  related  in  a  peculiar  manner  to  the  binary  numbers  in  the  first 
column  and  can  be  calculated  from  them.  The  process  can  best  be  described  by  an 


example.  Take,  for  instance,  the  binary  number  010110.  The  following  calculation  i 
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performed. 

+-+-+- 
0  10  110 
0  2  2  1  2  2 
0  1  2  0  0  2 

The  columns  here  alternate  +  and  -.  The  second  row  022122  is  obtained  by  summing 
the  first  row  horizontally  mod  3  with  +  or  -  sign  depending  on  the  column.  Thus  0=0, 
2=0-1,  2=0-140,  1=0-1+0-1,  2=0-140-1+1  and  2=0-1+0-1+1-0  (all  mod  3).  The 
third  row  is  obtained  from  the  second  by  alternately  adding  and  subtracting  the  first 
row  from  it.  This  row  is  the  corresponding  position  of  the  disks  in  the  solution  of  the 
puzzle.  It  can  be  shown  that  this  relation  holds  in  general. 

The  Tower  of  Hanoi  relay  circuit  is  based  on  this  curious  relation.  The  machine 
basically  consists  of  a  binary  counter  (six  stages  of  W-Z  counters)  which  counts  from 
0  to  63.  Contacts  on  these  relays  are  connected  in  a  network  which  controls  a  set  of 
eighteen  lights.  There  are  three  lights  for  each  of  the  six  disks,  one  on  each  of  the 
three  pegs.  At  a  given  time,  one  of  these  three  will  be  on,  indicating  the  position  of 
the  corresponding  disk.  As  the  counter  proceeds  through  its  count,  the  lights  are 
switched  to  indicate  the  process  of  the  solution. 

The  circuit  of  the  machine  is  shown  in  Fig.  1.  The  right  hand  network  controls  the 
lights.  It  will  be  seen  that  this  consists  of  a  symmetric  function  lattice  in  which  the 
stages  alternately  add  and  subtract  mod  3.  The  ground  coming  in  at  the  bottom  of  this 
circuit  will  appear  in  columns  0',  1',  2'  according  to  the  first  number  computed  in  the 
above  calculation  (i.e.  0'2'2'1'2'2'  in  the  example  given).  The  further  calculation 
(012002  in  the  example)  is  carried  out  by  the  single  stage  mod  3  circuits  attached  to 


the  basic  mod  3  lattice. 

It  is  interesting  in  this  circuit  that  when  one  of  the  larger  disks  is  moved  the  lamps 
corresponding  to  smaller  disks  receive  their  operating  current  through  a  path  which  is 
switched.  The  counting  process,  however,  is  so  rapid  that  they  appear  to  be 
continuously  illuminated. 

The  control  circuit  at  the  left  of  the  figure  contains  a  three-position  key  switch.  In 
the  center  position,  the  machine  stops.  In  the  top  position,  it  causes  the  buzzer  B  to 
operate  the  counter  and  therefore  proceed  through  the  solution  at  about  two  steps  per 
second.  When  the  count  reaches  sixty-three,  the  buzzer  stops.  If  the  key  switch  is 
depressed  to  the  lower  position  (non-locking),  the  counter  is  advanced  one  count.  By 
moving  the  switch  between  the  center  and  the  lower  positions  the  solution  can  be 
observed  step  by  step. 


Matbmanship  or  How  to  Give  an  Explicit  Solution  Without  Actually 

Solving  the  Problem 


After  reading  several  weighty  papers  giving  formulas 
which  assume  only  prime  values,  I  felt  moved  to  develop  a  few 
further  results  of  the  same  type. 

Theorem  1*    There  exists  a  unique  real  positive  number  X  <  1 
such  that 

e^  -  £2°  X]  -  2[2n-1  XI 

!0  if  n  is  composite 
1  if  n  is  prime 

Here  Lx]  means,  as  usual,  the  largest  integer  in  x. 
The  value  of  X  Is  ,413  •••• 
Theorem  2.    There  exists  a  unique  real  positive  number  \i  <  f 
such  that  the  n*"  prime      is  given  by 

-  IS?*1  u]  -  22*1  L2^  u] 

Hots  the  improvement  over  previous  results  -  this 
formula  gives  all  the  primes,  not  Just  some  of  them* 
For  analysts  who  find  the  bracket  symbol  a  little 
suspect,  we  have  the  following: 
Theorem  3*   There  exists  a  real  number  h  such  that  sin  2nq  is 

positive  or  negative  according  as  n  Is  prime  or  com* 
posits. 


»  2  a 

Theorem  4*    There  exists  a  real  number  &  such  that 

-  tan  2^  5|  <^ 

Proofs  are  left  as  an  exercise  for  the  reader. 

C.  E.  SHANNON 
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The  Relay  Circuit  Synthesizer  is  a  machine  to  aid 
in  switching  circuit  design.    It  is  capable  of  designing  two 
terminal  circuits  involving  up  to  four  relays  in  a  few  minutes. 
The  solutions  are  usually  minimal.    The  machine,  its  operation, 
characteristics  and  circuits  are  described. 
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Purpose  and  Operation 

The  Relay  Circuit  Synthesizer  (Photograph  214142) 
is  a  machine  to  aid  in  the  design  of  a  certain  class  of  relay 
circuits.    The  type  of  circuits  it  handles  are  two-terminal 
switching  circuits  involving  up  to  four  relays  or  (by  simple 
alterations)  other  two-valued  elements.    The  desired  charac- 
teristics of  the  circuit  to  be  designed  are  entered  in  a  set 
of  sixteen  three-position  switches  on  the  front  panel  of  the 
machine.    After  a  period  of  computation,  averaging  about  five 
minutes,  the  machine  stops  and  displays  a  circuit  satisfying 
the  requirements.    The  circuit  is  displayed  in  geometric  form 
on  a  card  in  an  associated  card  display  mechanism  (Photograph 
214140).    The  labels  of  the  contacts  on  this  card  must,  however, 
be  interpreted  in  accordance  with  indicating  lights  on  the 
front  panel  of  the  machine  to  obtain  the  proper  answer  to  the 
design  problem. 

In  about  eighty  per  cent  of  the  possible  problems 
that  can  be  set  up  on  the  machine,  the  solution  it  gives  will 
be  minimal  in  contacts,  i.e.,  the  number  of  contacts  in  the 
circuit  cannot  be  reduced.    In  the  remaining  twenty  per  cent, 
the  designs  cannot  be  simplified  by  more  than  one  contact  and 
may,  in  fact,  be  minimal. 

The  sixteen  input  switches  correspond  to  the  six- 
teen possible  states  of  the  four  relays  in  the  circuit  being 
designed.    Each  of  these  switches  has  three  positions  labeled 
"open,"  "donTt  care"  and  "closed".    If,  for  a  given  state  of 
these  relays,  it  is  desired  that  the  circuit  be  open,  the 
corresponding  switch  is  set  in  the  "open"  position.  Similarly 
for  the  "closed"  position.    If  it  does  not  matter  whether  the 
circuit  be  open  or  closed  in  this  state,  the  switch  is  set  at 
"don't  care"#    The  Synthesizer  takes  advantage  of  any  switches 
in  the  "don't  care"  position  in  attempting  to  reduce  the 
number  of  contacts  used  in  the  final  circuit.    It  fills  in 
these  unspecified  states  in  such  a  way  as  to  minimize  contact 
requirements.    This  ability  to  handle  partially  specified 
switching  problems  is  one  of  the  main  features  of  the  Synthesi- 
zer and  enables  it  to  solve  problems  for  which  analytic  methods 
are  at  present  ill-adapted. 
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In  addition  to  the  direct  circuit  designing  pro- 
cedure outlined  above,  the  Synthesizer  is  equipped  with 
controls  for  other  modes  of  operation.    It  may  be  run  at 
low  speed  for  demonstration  purposes,  it  may  be  set  up  to 
find  all  the  circuits  in  its  card  file  satisfying  the  re- 
quirements (not  just  the  one  with  the  smallest  number  of 
contacts)  and  it  may  be  used  to  determine  various  mathematical 
properties  associated  with  switching  functions* 

By  changing  the  paper  tape  and  the  card  file  used 
(but  without  any  internal  change  within  the  electrical  part 
of  the  machine)  it  can  be  made  to  solve  design  problems  in- 
volving diode  circuits  instead  of  relay  contact  circuits. 
By  a  still  different  tape  and  set  of  cards  it  can  minimize 
the  number  of  transfers  in  r elay  circuits  instead  of  the 
number  of  contacts.    With  suitable  tape  and  card  file,  it  can 
solve  a  variety  of  other  similar  problems. 

The  Synthesizer  represents  a  first  step  toward 
machine  design  of  switching  circuits.    Unfortunately,  although 
the  method  used  in  the  Synthesizer  may  be  generalized  in  prin- 
ciple to  circuits  involving  five  or  more  variables,  the  time 
for  solution  increases  at  an  alarming  rate.    With  five  vari- 
ables it  would  take  many  thousand  times  as  long  to  obtain  a 
solution.    The  card  file  and  the  tape  would' be  about  two  thou- 
sand times  their  present  size  and  would  require  many  man  years 
to  construct.     Consequently,  a  direct  generalization  of  the 
Synthesizer  is  hardly  indicated,  even  with  the  high  speeds 
available  in  electronic  computing  gear. 

Speed  of  Solution  With  Random  Problems 

An  idea  of  the  time  required  for  the  Synthesizer 
to  solve  problems  may  be  obtained  from  some  tests  with  random 
settings  of  the  input  switches.    Using  a  book  of  random  num- 
bers, ten  sets  of  sixteen  random  binary  digits  were  obtained. 
These  were  set  up  as  input  switch  settings  using  0  to  mean 
closed  and  1  open,  and  the  time  required  for  the  machine  to 
solve  each  of  these  problems  was  measured.    The  following  table 
gives  the  results  of  this  test. 
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Binary  Digits  Solution  Trans- 
( Switch  Settings)      Circuit  No.  formation 

0  0  11  #279             w*  w 

0  0  0  0  x1  z 

0  111  y  y 

110  0  z*  x 

10  0  1  #177             w»  x 

0  0  10  x  y 

0  0  10  y  z 

1111  z  w 

10  10  #306             w  z 

0  0  0  1  x*  y 

10  0  1  y»  w 

0  0  0  1  z»  x 

0  0  0  1  #261             w  z 

1  0  0  0  x»  w 
10  0  1  y  y 
1110  z»  x 

10  10  #212             w  x 

0  111  x*  w 

10  0  1  y1  y 

0  10  0  z  z 


Ho.  of  Time  of 
Contacts  Solution 

8  4min-10secc 


6  lmin-10sec« 


10  7min-20sec. 


10  7min-7sec. 


11         9min-6sec . 


Binary  Digits 
(Switch  Settings 

0  10  1 

0  0  0  0 

10  11 

1110 

0  10  0 
10  11 
1110 
1110 

0  0  0  0 

0  0  11 

0  0  11 

10  11 

0  10  0 
10  10 
10  10 
1110 

10  0  0 
0  111 
10  11 
10  11 
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Solution  Trans- 
Circuit  No.  formation 

#137  w  w 

x»  x 

y  y 

z  z 

#75  w  x 

X  z 

yT  y 

z  w 

#240  w«  y 

XI  w 

yf  x 

z  z 

#193  w  z 

x»  y 

y  w 

z  x 

#  34  w  x 

x»  z 

y  w 

z  y 


No.  of  Time  of 
Contacts  Solution 

9  6min-32sec. 

9  6min-10sec. 

5  3Ssec. 

#  4min-30sec. 

9  5min-50sec. 
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The  Solution  Circuit  Number  refers  to  the  Table 
in  MM-52-180-45,  E.  F.  Moore,  nA  Table  of  Four  Relay  Two  Ter- 
minal Contact  Networks".    The  Transformation  indicates  the 
required  change  of  variables  in  interpreting  the  numbered 
circuit  of  this  Table.    The  average  solution  time  for  these 
ten  completely  specified  random  functions  was  5  min.-15  sec, 
and  the  average  number  of  contacts  in  the  solution  was  8.5. 

A  second  test  was  run  with  partially  specified 
random  functions.    Again  using  the  Table  of  Random  Numbers, 
four  switches  were  chosen  at  random  for  "donTt  care"  settings; 
the  remaining  switches  being  given  random  "open"  or  "closed" 
settings.    This  was  done  four  times,  leading  to  the  following 
results: 


Binary  Digits 

(Switch  Settings)      Solution  Trans-         No.  of       Time  of 

D  "Don't  Care  Circuit  No.    formation    Contacts  Solution 

D  1  0  1  #334             ww  6  3min-5sec. 

0  D  0  0  xx 

D  1  0  D  y  y 

0  0  0  0  z  z 

D  0  1  D  #189             w*  w  7  6min-30sec. 

D  1  0  1  x  z 

0  10  0  y  y 

D  0  1  0  z  x 

0  1  0  D  #178             w    y  8  7min-25sec. 

0  D  1  1  x'  w 
D  0  0  1  y  z 
D  0  1  1  z»  x 

001  D  #58  wy  3  12sec. 
0  D  D  1  x  w 

D  0  1  1  y»  z 

10  11  z»  x 
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The  average  time  of  solution  for  these  problems  with  four  un- 
specified states  was  4  min.-20  sec,  with  an  average  of  6 
contacts. 

Finally,  a  test  was  run  with  random  problems  having 
eight  unspecified  ("don't  care")  states.    These  results  were 
as  follows: 


Binary  Digits 
(Switch  Settings) 
D=Donlt  Care 

0  D  1  D 

D  D  D  0 

D  0  0  1 

D  0  0  D 


Solution 
Circuit  No, 

#204 


Trans-  No.  of 
formation  Contacts 


w 

X 

y 

z 


w 
z 

y 

X 


Time  of 
Solution 

55sec. 


0  D  D  1 

1  D  1  D 
10  10 
D  D  D  D 


#179 


w  y 
x  x 

y'  z 


Z1  Z 


6 


2min-55sec, 


0  0  D  D 

0  0  D  1 

1  D  D  1 
0  D  D  D 


#  5* 


w  y 

x  x 

y  w 

z  z 


40sec, 


D  D  D  D 

1  1  D  D 

D  1  1  0 

D  1  0  1 


#  79 


w*  y 
x'  z 
y  x 
z  w 


3min-15sec, 


The  average  solution  time  here  was  1  min.-Sft  sec,  and  the 
average  number  of  contacts  4»5, 
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The  following  table  summarizes  these  average  figures: 

Completely  Unspecified  Unspecified 
specified  in  4  states  in  g  states 

average  time  5min-15sec         4min-20sec  lmin-56sec 

average  number  £.5  6  4.5 

of  contacts 


With  still  more  "don't  care"  states  the  solution  time  and 
average  number  of  contacts  would  undoubtedly  decrease  still 
further. 

General  Theory  of  Operation 

The  Relay  Synthesizer  deals  with  Boolean  functions 
of  four  variables.    Each  of  the  variables  has  two  possible 
values,  0  to  Ij  in  conjunction  there  are  24  =  16  sets  of  values 
or  "states"  of  the  variables.    For  each  of  these  states,  a 
function  of  these  variables  can  be  either  0  to  1.    Thus  there 
are  2 16  =  65,536  different  Boolean  functions  of  four  variables. 
It  is  known  that  these  65,536  functions  can  be  subdivided  into 
402  classes  or  "types"  of  functions.    Two  functions  are  said  to 
be  of  the  same  type  if  one  may  be  obtained  from  the  other  by 
negating  some  of  the  variables  or  permuting  some  of  the  vari- 
ables or  both.    Thus  the  function 


w  +  x»(y+z) 


is  of  the  same  type  as 

x»  +  z(w+y*) 

or 


wT  +  yfx'+z*). 


All  functions  of  a  given  type  present  substantially  the  same 
design  problem.    If  a  good  circuit  is  found  for  one  of  them, 
it  applies  equally  to  all  other  functions  of  the  same  type, 
for  it  is  necessary  only  to  relabel  contacts  properly  and  it 
will  represent  these  other  functions. 
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In  the  memorandum  referred  to  above,  circuits  are 

foTen*.f0r  these  402  types  of  functions.    At  present  writing, 
331  of  these  have  been  proved  to  be  minimal  in  contacts;  the 
remaining  71  are  known  to  be  within  one  contact  of  being 
minimal.    This  catalog  of  circuits  is  a  key  part  of  the  design 
procedure  in  the  Relay  Synthesizer.  S 


The  reader  may  wonder  why  the  Synthesizer  is  ne- 
cessary for  designing  circuits  when  such  a  catalog  is  available. 
Why  not  merely  find  the  circuit  corresponding  to  the  desired 
function  in  the  catalog?    The  answer  is  that  it  is  not  at  all 
easy  to  find  the  type  or  class  to  which  a  given  function  be- 
longs even  when  the  function  is  completely  specified.    If  the 
desired  function  is  not  completely  specified : (has  one  or  more 

don't  -care"  states)  there  will  in  general  be  many  types  of 
functions  consistent  with  the  requirements,  and  it  becomes 
extremely  difficult  to  locate  these  in  the  catalog.     The*  Syn- 
thesizer is,  in  fact,  a  machine  for  determining  the  type* of  a 
fully  specified  function  and  (in  the  partially  specified  case) 
the  possible  type  having  the  least  number  of  contacts  in  its 
catalog  circuit, 

A  block  diagram  of  the  Synthesizer  is  shown  in 
Figure  1,  and  indicates  the  main  functional  organization.  The 
specifications  of  the  desired  circuit  are  set  up  on  the  input 
switches  in  the  right-hand  box.    The  catalog  of  the  402  types 
of  functions  appears  on  a  paper  tape  in  the  left-hand  Tape 
Input  box.    Each  function  occupies  six  lines  of  tape.  The 
first  four  lines  give  the  states  for  which  the  function  is 
closed.    The  fifth  line  gives  the  number  (in  binary  form)  of 
closed  states  for  the  function,  and  the  sixth  line  contains  a 
special  hole  marking  the  end  of  data  relating  to  this  function, 
i.e.,  it  acts  as  a  punctuation  mark  separating  functions  on 
the  tape. 

In  solving  a  particular  problem,  the  tape  functions 
are  studied  one  by  one  in  the  machine.  All  permutations  and 
negations  of  a  particular  tape  function  are  compared  with  the 
desired  specifications  as  set  up  on  the  input  switches,  when 
an  exact  match  is  found  the  machine  stops,  and  the  tape  func- 
tion together  with  the  permutation  being  applied  to  it  re- 
present a  solution  to  the  problem. 

In  the  block  diagram  this  is  carried  out  as  follows: 
The  tape  function  is  stored  in  the  memory  relays.    A  permuting - 
negating  network  applies  the  equivalent  of  the  various  possible 
permutation  and  negation  operations  to  these  data.    The  results 
of  each  permutation-negation  operation  are  compared  with  the 
input  switches  in  a  comparison  circuit  to  see  if  a  match  has 


occurred.    If  not5  an  error  signal  is  fed  back  to  the  permu- 
tation sequencer,  causing  it  to  advance  to  the  next  permutation 
operation  which  is,  in  turn,  compared,  etc.,  until  all  of  the 
3#4  possible  permutations  and  negations  have  been  tested.  Be- 
cause of  short-cut  circuits  to  be  described  later,  the  machine 
frequently  skips  many  of  these,  reducing  the  solution  time 
considerably. 

When  the  set  of  operations  on  a  particular  function 
is  exhausted,  the  permutation  sequencer  sends  a  signal  back 
to  the  tape  driving  circuit,  and  the  next  function  is  read 
into  the  memory  for  test.    This  signal  also  causes  the  card 
display  device  to  drop  another  card  from  its  stack.    The  card 
displayed  always  corresponds  to  the  function  being  tested  in 
the  machine  and  shows  the  most  efficient  knovn  circuit  for 
that  function. 

The  permutation  indicator  is  controlled  by  the  per- 
mutation sequencer  and  indicates  in  lights  the  permutation 
currently  being  tested.    When  the  machine  stops  at  a  solution, 
these  lights  show  what  permutation  and  negation  must  be  ap- 
plied to  the  circuit  on  the  card  to  solve  the  problem  at  hand. 

In  the  problems  involving  "don^  cares,"  the  Syn- 
thesizer could  be  used  to  successively  find  all  of  the  solution, 
but  to  use  all  this  information  in  designing  a  circuit,  it  would 
be  necessary  to  compare  all  the  circuits  obtained,  and  see  which 
one  is  preferred.    Since  the  grounds  for  preferring  one  circuit 
over  another  has  been  taken  to  be  economy  of  contacts,  the  ne- 
cessity for  this  comparison  step  has  been  eliminated  by  arrang- 
ing the  functions  on  the  tape  in  order  of  increasing  number  of 
contacts,  so  that  the  first  solution  arrived  at  will  automatic- 
ally be  the  preferred  one.    Arranging  the  functions  on  the  tape 
in  terms  of  any  other  criterion  will  cause  the  Synthesizer  to 
design  circuits  based  on  this  criterion.    If,  for  instance,  it 
is  desired  to  design  relay  circuits  using  as  few  springs  as 
possible,  or  to  design  diode  logic  circuits  using  as  few  diodes 
as  possible,  it  is  only  necessary  to  arrange  the  functions  on 
the  tape  in  order  of  number  of  springs  or  number  of  diodes, 
respectively. 

Circuit  Operation 

Figure  2  is  the  circuit  diagram  of  the  Synthesizer. 
The  layout  of  subcircuits  corresponds  roughly  to  the  block 
diagram  Figure  1.    We  will  first  describe  the  circuit  operation 
in  the  logically  3'implest  mode  of  operation  —  the  normal  mode 
with  all  short-cut  circuits  eliminated.    In  Figure  2,  then,5we 
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assume  the  mode  of  operation  switch  in  the  "Normal"  position 
N,  the  relay  Q  operated  (eliminates  permutation  short  cuts) 
and  the  number  of  state  switches  M  are  set  at  "Normal",, 

Since  the  Synthesizer  is  essentially  a  closed  loop 
system,  it  is  difficult  to  find  a  point  at  which  to  start  a 
description  of  its  operation.    It  is  perhaps  simplest  to  as- 
sume that  the  machine  has  just  finished  testing  one  function 
on  the  tape.    The  relay  H  may  then  be  assumed  to  have  just 
operated  locking  in  to  the  make  on  R  .  since  the  tape  reader 

will  be  at  the  division  line  between  functions  and  consequently 
Rs  operated,,    Operation  of  H  releases  the  hold  on  the  memory 

relays  (M^M-^  „  „  „  .M^)  and  also  the  hold  on  the  steering 

counter  relays  (W^Z^W^Z^W^Z^) ,  thus  resetting  this 

counter  to  zero.    It  also  applies  voltage  to  the  teletype 
magnet  which,  a  moment  later,  will  pull  free  of  the  tape  and 
hence  release  R  .    This  releases  H  and  reconnects  the  holds 
of  the  steering  counter  and  the  memory  relays.    It  also  es- 
tablishes a  path  to  the  slow  relay  SO  through  its  own  back 
contact  SO*.    SO  now  acts  like  a  slow  buzzer,  producing 
pulses  at  a  rate  of  about  six  per  second  and  relay  U  follows 
these  pulses  through  the  SO  make  contact. 

The  pulses  produced  by  U  operate  the  teletype  magnet, 
advancing  it  line  by  line  until  it  reaches  the  line  with  an  R 
hole,  at  which  point  the  back  contacts  on  Rg  open  both  the  s 
buzzer  circuit  to  SO  and  the  teletype  magnet  circuit  through  U. 
The  pulses  produced  by  U  are  also  fed  into  the  three-stage 
binary  counter  consisting  of  three  WZ  pulse  dividers  WjqZ^, 

WM2ZM2'»  ^oho*    Tnis  countei*>  therefore,  keeps  track  of  the 
line  of  tape,  counting  from  the  last  division  between  two  tape, 
functions  iRs  hole).    This  counter  controls  the  steering  trees 
leading  into  the  memory  relays  Mq,]^,  . . .  ,1^  and  the  number  of 
state  relays  Vl5V2,V^,Vg0    The  first  line  of  tape  after  the  Rg 
line  is  fed  into  M^M^M^M^,  the  second  line  into  M^,M5,M6,M7„ 
the  third  into  M^M^M^M^,  the  fourth  into  M12tM13 »M14»M1cJ 
and  the  fifth  into  ^^"^Vg.    A  section  of  the  tape  is 
shown  in  Figure  3. 

The  completion  of  this  tape  reading  operation,  in- 
dicated by  closure  of  Rg,  puts  ground  on  lead  106  leading  into 
the  permutation-negation  network. 
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Permuting  and  Negating  Circuits 

These  circuits  enable  the  machine  to  apply  the 
3#4  negation  and  permutation  operations  to  the  tape  function 
stored  in  the  memory  to  compare  it  with  the  desired  function 
set  on  the  input  switches. 

The  negation-permutation  sequencer  consists  of  nine 
WZ  pairs  connected  in  a  form  of  counting  circuit  which  can  go 
through  3#4  different  states.    Starting  from  the- iigh  .speed  k 
(pulsed)  end  of  this  circuit,  the  first  (6ix/WZ  pairs,  E,  D,  B, 
C  and  A,  relate  to  permutations  and  can  go  through  twenty-four 
states  corresponding  to  the  41  =  24  permutations  of  the  four 
variables.    The  other  four  stages  w,  x,  y,  z  relate  to  negating 
the  variables  and  can  go  through  sixteen  states  corresponding 
to  the  sixteen  ways  of  negating  four  variables.    In  combination 
this  gives  3#4  states. 

In  the  circuit,  imagine  Q  operated,  FQ  and  FT£  re- 
leased and  thatFo    is  pulsed,  so  that  a  series  6f  pulsus  is 
applied  to  line  109.    The  negation-permutation  ^sequencer  will 
then  proceed  through  the  3^4  negation-permutation  operations. 
This  sequence  is  shown  in  the  accompanying  Table  I  for  the 
first  twenty-four  of  these,  i.e.,  a  full  set  of  permutations. 
At  the  twenty-fourth  step  this  sequence  repeats  for  the  permu- 
tation relays  but  a  pulse  is  applied  at  lead  250,  advancing 
the  negating  relays  one  step.    The  negating  relays  go  through 
the  sequence  shown  in  Table  II,  advancing  one  step  after  the 

fermuting  relays  have  gone  through  a  full  set  of  permutations, 
n  this  manner  the  full  set  of  16  x  24  combinations  is  ex- 
hausted. 
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Table  I 
Sequence  of  Permutations 


Relays 

WA  WB  WC  WD  WE 
(1  means  operated) 

Relays 
A  B  C  D 

£ 

Permutation 
W    X    Y  Z 
Becomes 

____ 

o 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

W 

X 

Y  Z 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

W 

Y 

Z  X 

2 

0 

0 

0 

1 

0 

1 

1 

1 

0 

1 

w 

Z 

Y  X 

3 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

w 

Y 

X  Z 

4 

0 

0 

1 

1 

0 

1 

1 

0 

0 

1 

w 

Z 

X  Y 

5 

0 

0 

0 

0 

1 

1 

1 

1 

1 

0 

w 

X 

Z  Y 

6 

0 

1 

0 

0 

0 

1 

0  1 

1 

1 

Y 

X 

W  Z 

7 

0 

1 

0 

1 

1 

1 

0 

1 

0 

0 

z 

Y 

W  X 

a 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

Y 

Z 

W  X 

9 

o 

1 

1  1 

1 

1 

0 

0 

0 

0 

X 

Y 

¥  Z 

10 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

X 

Z 

¥  Y 

11 

0 

1 

0 

0 

1 

1 

0 

1 

1 

0 

z 

X 

W  Y 

12 

1 

1 

0 

0 

0 

0 

0, 

51 

1 

1 

X 

Y 

Z  W 

13 

1 

1 

0 

1 

1 

0 

0 

1 

0 

0 

Y 

Z 

X  w 

14 

1 

1 

0 

1 

0 

0 

0 

1 

0 

1 

Z 

Y 

X  ¥ 

15 

1 

1 

1 

1 

1 

U 

0 

0 

0 

0 

Y 

X 

L  W 

1 

1 

1 

1 

0 

0 

0 

0 

0 

T 
X 

Z 

X 

Y  ¥ 

17 

1 

1 

0 

0 

1 

0 

0 

1 

1 

0 

X 

Z 

Y  ¥ 

Id 

1 

0 

0 

0 

0 

0 

1 

1 

1 

1 

X 

¥ 

Z  Y 

19 

1 

0 

0 

1 

1 

0 

1 

1 

0 

0 

Y 

W 

X  Z 

20 

1 

0 

0 

1 

0 

0 

1 

1 

0 

1 

Z 

W 

X  Y 

21 

1 

0 

1 

1 

1 

0 

1  0 

0 

0 

Y 

W 

Z  X 

22 

1 

0 

1 

1 

0 

0 

1 

0 

0 

1 

Z 

w 

Y  X 

23 

1 

0 

0 

0 

1 

0 

1  1 

1 

0 

X 

¥ 

Y  Z 
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Table  II 
Sequence  of  Negations 


Relays                      Relays  Variables 
Ww  Wv  Vf    Vtf               W    I    T    Z  W    X    Y  Z 

w    x    7    z  '   Become 


0 

0 

0 

0 

1 

1 

1 

1 

w 

X 

Y 

Z 

0 

0 

0 

1 

1 

1 

1 

0 

w 

X 

Y 

Z' 

0 

0 

1 

1 

1 

1 

0 

0 

w 

X 

Y» 

z» 

0 

0 

1 

0 

1 

1 

0 

1 

w 

X 

Y' 

z 

0 

1 

0 

0 

1 

0 

1 

1 

w 

X* 

Y 

z 

0 

1 

0 

1 

1 

0 

1 

0 

w 

x» 

Y 

z» 

0 

1 

1 

1 

1 

0 

0 

0 

w 

XT  y» 

z» 

0 

1 

1 

0 

1 

0 

0 

1 

¥ 

x» 

Y» 

z 

1 

1 

0 

0 

0 

0 

1 

1 

x» 

Y 

z 

1 

1 

0 

1 

0 

0 

1 

0 

w» 

x» 

Y 

z» 

1 

1 

1 

1 

0 

0 

0 

0 

w» 

X1 

Y» 

z» 

1 

1 

1 

0 

0 

0 

0 

1 

x» 

Y' 

z 

1 

0 

0 

0 

0 

1 

1 

1 

X 

Y 

z 

1 

0 

0 

1 

0 

1 

1 

0 

X 

Y 

z» 

1 

0 

1 

1 

0 

1 

0 

0 

X 

Y» 

z» 

1 

0 

1 

0 

0 

1 

0 

1 

X 

Y» 

z» 

At  the  end  of  this  sequence,  a  ground  is  applied 
to  line  135  which  initiates  reading  in  a  new  function. 

It  may  also  be  noted  that  if  relay  Q  is  released 
and  F16  is  operated  a  ground  is  applied  directly  to  line  250, 
the  input  to  the  negating  part  of  the  counter.    This  will 
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cause  the  counter  to  skip  a  set  of  permutations  and  advance 
directly  in  the  negating  sequence  by  one  step.    Operation  of 
Fig  also  releases  the  plus  side  of  the  permutation  relays  in 
the  sequencer,  resetting  them  to  zero.    The  function  of  F|g 
is  to  short-cut  some  of  the  calculation  in  certain  cases  as 
will  be  described  later. 

In  a  similar  way,  operation  of  F&  with  Q  released 
advances  the  and  Wg  parts  of  the  permutation  sequence  by 
one  step,  skipping  a  subset  of  six  permutations  in  which 

Wq,  Wd  and  WE  take  part.    F^  releases  the  plus  to  these  three 
WZ  pairs,  resetting  them  to  zero.    This  also  is  used  for 
short  :out_  purposes. 

The  permuting  and  negating  relays  A,  B,  C,  D,  E  and 
W,  X,  Y,  Z    are  operated  from  back  contacts  of  the  correspond- 
ing W  relays  in  the  WZ  pairs  of  the  sequencer.    Thus  they  as- 
sume the  complementary  states  as  shown  in  Tables  I  and  II. 
The  function  of  these  nine  sets  of  relays  is  to  interchange 
sixteen  leads  representing  the  function  in  the  memory  relays 
in  accordance  with  the  permutation  and  negation  in  the  se- 
quencer. 

The  logical  organization  of  this  circuit  can  be 
represented  in  a  symbolic  form  by  Figure  4,  which  indicates 
the  effect  of  the  negating  and  permuting  relays  on  the  variables 
of  the  tape  function,  (not  £he  effect  on  the  sixteen  leads) . 
Thus,  the  W  relay  negates  the  variable  W  when  released,  the  X 
relay  negates  X,  etc.    The  A  relay  interchanges  W  and  X  and 
also  Y  and  Z  when  released,  the  B  relay  interchanges  the  vari- 
ables now  appearing  (after  the  possible  A  interchange)  on  the 
first  and  third  lines,  etc.    It  will  be  found  that  the  twenty- 
four  combinations  of  A,  B,  C,  D,  and  E  produced  by  the  sequencer 
(Table  I)  lead  to  the  twenty-four  permutations  of  the  four 
variables  as  shown  in  Table  I« 

Now  the  circuit  does  not  work  with  the  four  Boolean 
variables  but  with  sixteen  lines  representing  the  sixteen 
states  of  the  four  variables.    Negating  a  variable,  say  W, 
corresponds  to  interchanging  the  eight  lines  (or  states)  for 
which  W  is  1  with  the  corresponding  eight  lines  for  which  W 
is  zero.    Thus  in  the  premuting  circuit,  the  W  negation  box 
of  Figure  4  becomes  eight  reversing  or  interchanging  circuits 
operated  by  the  relays  W-^  W2,  W3,  W^.    A  similar  statement 

applies  to  the  negation  of  the  other  variables  and  the  per- 
muting of  the  variables  by  the  Ai}  B  ,  C,  D  and  E  relays. 
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To  summarize,  the  sequencer  can  go  through  3#4 
states  representing  the  3#4  permutations  and  negations.  The 
negating-permuting  network  sets  up  the  corresponding  inter- 
changes of  the  sixteen  lines  from  the  memory  to  the  input 
switches.    At  the  memory  end,  these  lines  are  given  plus  or 
minus  voltage  according  as  the  memory  function  is  open  or 
closed.    At  the  input  switch  end,  after  the  permutation  and 
negation,  these  voltages  are  compared  with  the  settings  of 
the  input  switches,, 

There  are  two  types  of  comparison  circuits.  The 
first  type,  Figure  5,  applies  to  switches  Q,  7,  S  and  15. 
It  will  be  seen  that  Ffo  will  operate  if  the  lead  from  the  per- 
muting network  is  positive  and  the  switch  is  set  at  "closed," 
or  if  the  lead  is  negative  and  the  switch  is  set  at  "open," 
i.e.,  if  there  is  a  disagreement  between  the  switch  setting 
and  the  value  coming  in  from  the  permuting  network.     If  the 
switch  is  set  at  "don't  care,"  Fk  will  not  operate.    It  will 
also  be  seen  that  the  red  and  green  lights  will  indicate 
"closed"  and  "open"  settings  of  the  switch  respectively, 
while  if  set  at  "don't  care"  the  red  or  green  light  will  in- 
dicate minus  or  plus  coming  in  from  the  permuting  network. 

The  comparison  circuit  for  the  other  switches  is 
somewhat  different.    There  are  two  relays  F-^  and  F2  common 

to  all  the  other  switches.    If  a  particular  switch  is  set  at 
"closed,"  the  line  from  the  permuter  goes  through  a  diode 
to  F1,  the  other  side  of  F1  being  minus  (when  the  test  is 

made).    Thus  F1  will  operate  if  a  plus  appears  on  the  line 
from  the  permuter  (disagreeing  with  the  "closed"  position 
of  the  switch).    If  the  switch  is  set  at  "open,"  the  path 
from  the  permuter  goes  through  the  same  diode  but  in  the  op- 
posite direction  to  F2,  whose  other  side  is  connected  to 
plus.    Hence  F2  will  operate  if  a  minus  comes  in  from  the 

permuter.  The  red  and  green  lamps  are  connected  substantially 
as  before. 

Returning  now  to  the  description  of  the  operating 
sequences  in  the  machine,  we  recall  that  the  completion  of 
tape  reading  of  a  function  into  the  memory  was  signified  by 
closure  of  R  .    This  applies  ground  at  lead  106  into  a  long 
"equality  chain"  of  contacts.    This  chain  is  closed  only  if 
all  of  the  W  relays  in  the  WZ  pairs  of  the  sequencer  agree 
in  position  with  their  corresponding  Z  relays.    This  being 
true,  ground  is  applied  to  the  permuting  and  negating  net- 
work, and,  as  already  described,  one  or  more  of  the  F  relays 
(FQ,  F^,  Fg,  F^,  F^,  F2)  will  operate  unless  the  tape  func- 
tion as  permuted  through  the  network  agrees  with  the  input 
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function.    Assuming  there  is  a  disagreement,  one  at  least 
of  F^,  F^,  F^£  will  operate,  grounding  the  input  to  the  ne- 
gation-permutation sequencer.    This  advances  the  W  relays 
of  the  sequencer  one  step  in  the  sequence,  and  causes  a  dis- 
agreement between  at  least  one  of  the  W  relays  and  its 
jSorresponding  Z  relay  in  the  WZ  pairs.    This  disagreement, 
in  turn,  opens  the  "equality  chain,"  releasing  the  F  relays 
which,  in  turn,  removes  the  ground  from  the  sequencer  and 
allows  its  Z  relays  to  follow  their  corresponding  ¥  relays. 
When  equality  has  again  been  established,  ground  is  again 
applied  through  the  "equality  chain"  to  the  permuting  network 
and  the  next  permutation  of  the  sequence  (now  set  up  on  the 
permuting  network)  is  tested  in  the  same  way.    This  cycle  of 
operations  continues  until  the  full  set  of  permutations  and 
negations  has  been  tested.    After  the  last  permutation,  the 
next  ground  goes  through  a  Zw  contact  and  the  mode  of  opera- 
tion switch  to  operate  H,  signifying  the  completion  of  tests 
on  the  current  function  and  initiating  reading  the  tape  for 
the  next  function  as  previously  described. 

If,  at  some  point,  the  permuted  tape  function 
matches  the  input  function,  no  F  relay  will  operate  and  the 
cycle  is  stopped.    Relay  J  will  operate  and,  in  turn,  L 
through  the  chain  of  back  contacts  on  the  F  relays.  The 
operation  of  L  rings  the  gong  indicating  a  solution,  and 
pulses  the  message  register  for  counting  purposes. 

Short-Cut  Operation 

We  now  describe  the  short-cut  provisions.    If  the 
short-cut  eliminator  is  "off,"  relay  Q  will  release,  rear- 
ranging the  inputs  to  the  sequencer.    In  the  permuting  net- 
work it  will  be  seen  that  the  lines  on  the  zero  level  and 
on  the  15  level  are  not  switched  after  the  vertical  column 
of  Z  contacts,  i.e.,  after  emerging  from  the  negating  part  of 
this  circuit.    This  means  that  if  a  disagreement  occurs  on 
either  of  these  lines,  it  will  persist  throughout  all  the 
permutations,  which  only  change  the  switches  A,  B,  C,  D  and 
E  in  this  network.    Hence,  in  case  of  such  a  disagreement  it 
is  not  necessary  to  test  all  of  these  permutations  but  the 
machine  can  proceed  immediately  to  the  next  negation  saving 
a  great  deal  of  time. 

In  the  circuit,  when  Q  is  released,  operation  of 
Fv  or  F,~  pulses  directly  into  the  negating  part  of  the 
sequencer  and  resets  the  permuting  part  to  zero. 
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In  a  similar  manner,  it  will  be  seen  that  the  lines 
at  the  7  and  &  level  in  the  permuter  are  not  switched  after  the 
B  contacts.    This  means  that  a  disagreement  on  either  of  these 
lines,  indicated  by  operation  of  Fy  or  F#,  will  persist  over 
the  subset  of  six  permutations  in  which  C,  D  and  E  change* 
Hence  it  is  unnecessary  in  such  a  case  to  test  each  of  these 
individually  and  the  machine  advances  to  the  next  permutation 
involving  a  change  of  A  or  B.     In  the  sequencer,  a  ground  is 
applied  at  the  input  to  the  A,  B  stages  and  G,  D,  E  stages 
are  reset  to  zero.    This  is  done  by  relay  Fq  which  will  pperate 
if  either  Fy  or  Fg  indicates  disagreement. 

One  further  short-catting  device  has  been  incorpor- 
ated in  the  machine.    With  each  tape  function  is  included,  in 
binary  form,  the  number  of  states  for  which  that  function  is 
closed.    As  previously  described,  this  number  is  stored  in  the 
relays  Vlf  V2,  V^,  Vg,  Vlo  when  the  function  is  read  off  the 

tape.    On  the  front  panel  of  the  machine  are  two  seventeen- 
point  switches  labeled  Max  and  Min.    The  Min  switch  should  be 
set  at  a  number  equal  to  the  number  of  input  switches  in  the 
"closed"  position.    The  Max  switch  should  be  set  at  this 
number  plus  the  number  of  "don't  cares".    Now,  regardless  of 
how  the  "don't  cares"  may  be  filled  in,  the  number  of  closed 
states  will  be  within  this  range  (including  the  end  points). 
A  function  from  the  tape  could  not  possibly  be  satisfactory 
unless  its  number  of  states  lies  within  this  range.  The 
machine  is  arranged  to  compare  these  numbers  and,  if  this  con- 
dition is  not  satisfied,  to  skip  the  function  completely  and 
go  immediately  to  the  next  function  on  the  tape. 

The  comparison  is  carried  out  in  the  "number  of 
states  comparison  circuit".    The  contacts  on  the  V  relays  are 
arranged  in  the  topological  dual  of  an  ordinary  tree.  This 
implies  that  if  the  number  n  is  registered  (in  binary  form) 
in  the  V  relays,  then  all  of  the  vertical  leads  labeled  zero 
to  n  at  the  Min  switch  will  be  connected  together,  but  the 
two  groups  are  not  connected.    It  will  be  seen,  therefore, 
that  if  the  number  on  the  V  switches  lies  in  the  range  covered 
by  the  Max  and  Min  settings,  then  the  Max  and  Min  swingers 
will  not  be  connected.    If  the  V  number  is  outside  this  range 
then  the  Max  and  Min  swingers  will  be  connected.    If  the  Max 
and  Min  swingers  are  connected,  the  operation  of  R    closes  a 

path  to  operate  H  and  start  reading  in  a  new  function  imme- 
diately. 

It  is  necessary  to  use  five  relays  -  V-^,  V2,  V^, 
Vrt,  and  V-j^-to  represent  all  of  the  numbers  from  0  to  16  in- 
clusive, but  there  were  only  four  holes  readily  available  on 


the  tape  for  reading  into  these  relays.    Consequently  four 
of  the  relays  are  read  into  directly  through  the  steering 
relays,  and  a  special  artifice  is  used  to  get  the  fifth  digit 
stored  in 

Since  the  only  case  in  which  this  digit  equals  1 
is  when  the  number  of  states  is  16,  and  all  the  other  four 
relays  are  released,  this  relay  is  operated  through  the  back 
contacts  of  Vlr  V2,  V^,  and  Vg  in  series.    But  since  V-^,  V^, 

V^p  and  Vg  are  also  all  released  when  the  number  of  states  is 

0,  a  contact  of  Mq  is  also  included  in  the  operate  path,  to 

distinguish  between  these  two  cases. 

Without  the  short-cutting  features  the  average  time 
of  solution  for  a  completely  specified  function  would  be  over 
an  hour;  with  short  cuts  it  is  about  five  minutes. 

Indicating  Circuits 

A  set  of  indicating  lights  is  provided  which  shows 
the  permutation  and  negation  that  must  be  applied  to  the  tape 
function  (when  a  solution  has  been  found)  to  transform  it  into 
the  function  on  the  input  switches.    The  eight  negating  lights 
are  connected  in  a  simple  fashion  to  the  W,  X,  T  and  Z  coils. 
If  the  W  relay  is  out,  for  example,  the  W*  lamp  lights  up  by 
a  current  through  the  W  coil  (not  sufficient  to  operate  the 
W  relay).    If  the  W  relay  is  operated,  the  W  lamp  lights  up  by 
current  through  the  Ww  contact. 

The  circuit  for  the  permuting  part  is  more  complex. 
However,  on  tracing  through. the  circuits  it  will  be  found 
that  the  lights  always  receive  proper  voltages  to  indicate 
the  permutation  set  up  on  the  A,  B,  C,  D,  E  relays.    For  ex- 
ample, in  the  first  (identity)  permutation^ A,  B,  C,  D  and  E 
are  all  operated.    It  will  be  seen  that  the  eight  center 

foints  between  pairs  of  lamps  receive  the  following  voltages: 
0  indicates  floating) 


+  «  0  0 
0  0  +  -  . 


Hence  the  diagonal  series  of  lamps 
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W  -  -  - 

-  X  -  - 

-  -  Y  - 

-  -  -  Z 


will  be  lighted.  Note  that  the  lamps  connected  to  floating 
points  receive  half  voltage  by  a  sneak  path  through  the  two 
lamps  in  series.  This  is  not  sufficient  to  illuminate  them 
perceptibly. 

Another  permutation  indicating  light  circuit  has 
been  provided  for  trouble  shooting  and  for  better  observation 
of  the  machine  while  in  action.    This  consists  of  twenty-five 
small  neon  lamps.    Twenty-four  of  these  correspond  to  the 
twenty-four  permutations  of  the  variables.    These  are  ar- 
ranged in  a  rectangle  six  wide  and  four  high.    In  operation 
without  short  cuts,  these  lamps  light  sequentially  from  left 
to  right  across  the  first  row,  then  across  the  second,  etc. 
In  short  cuts  due  to  the  Fq  and  F^^  relays  the  whole  pattern 

of  twenty-four  permutations  is  skipped.    In  short  cuts  due  to 
F^  and  Fg  a  horizontal  row  in  this  display  is  skipped  (only 

the  first  lamp  of  the  row  going  on). 

The  circuit  controlling  these  lights  consists  of  a 
tree  on  relays  A  and  B  which  selects  the  row  and  a  second 
tree  on  C,  D  and  E  which  selects  the  column.    Only  the  lamp 
at  the  intersection  point  will  go  on.    Sneak  paths  through 
other  lamps  all  involve  at  least  three  lamps  in  series  and 
the  voltage  is  not  sufficient  for  breakdown  of  such  a  series 
combination. 

The  twenty-fifth  lamp  is  connected  to  light  up  if 
the  C,  D  and  E  relays  get  into  either  of  the  two  other  pos- 
sible states  which  do  not  correspond  to  permutations  in  the 
regular  sequence  of  operations.    It  can  thus  indicate  certain 
trouble  conditions. 

Other  Modes  of  Operation 

With  the  mode  of  operation  switch  set  in  the  P  pos- 
ition (periodic),  the  machine  does  not  advance  the  tape  after 
the  sequence  of  permutations  and  negations  but  periodically 
goes  through  the  tests  on  the  function  in  the  memory.    In  this 
switch  position  the  path  to  the  H  relay,  which  ordinarily  ini- 
tiates the  tape  reading  process,  is  open.    This  mode  is  some- 
times useful  for  trouble  shooting. 
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In  the  S  position  ( step~by~step) ,  the  machine  tests 
a  permutation  and  then  stops  until  the  Run  switch  is  operated 
and  released.    The  path  which  normally  puts  ground  on  the  relays 
F^,  is  opened  and  replaced  by  a  contact  on  the  Run 

switch  connected  to  a  condenser.    When  the  Run  switch  is  off, 
this  condenser  charges,  and  when  pressed  for  a  step  in  the  oper- 
ation it  discharges  through  F  .  F^  or  Only  enough  charge 

is  stored  to  operate  these  relays  once.    For  the  next  step  the 
Run  switch  must  be  released  and  pressed  again. 

In  the  L  mode  (low-speed),  the  machine  operates  as 
in  the  normal  mode  except  at  a  much  lower  speed.    This  is 
achieved  in  a  fashion  similar  to  the  step-by-step  operation 
but  with  the  function  of  the  Run  switch  replaced  by  relay  N. 
The  N  relay  is  operated  by  the  G  relay  which  is  connected  in 
a  relaxation  oscillator  circuit  using  a  gas  tube.    The  conden- 
sers charge  up  sufficiently  to  break  down  the  gas  tube  which 
operates  G,  closing  its  make  contact  and  discharging  the  con- 
denser which  then  starts  recharging.    This  slow  oscillation 
of  G  causes  N  to  oscillate  slowly  which,  in  turn,  allows  the 
solution  to  proceed  at  a  slow  rate. 

In  Mode  Q  ( self- restarting) ,  the  machine  does  not 
stop  at  a  solution  but  rings  the  gong,  pulses  the  message 
register,  and  then  proceeds  to  the  next  permutation  or  nega- 
tion in  the  sequence.    When  a  solution  is  reached  in  this  mode, 
the  operation  of  relay  L  causes  the  message  register  to  operate. 
This  releases  relay  £  which  releases  the  message  register  and 
also  applies  voltage  to  slow-operate  relay  G.    Operation  of  G 
energizes  N,  which  in  turn  advances  the  permutation  sequencer 
one  step  and  also  energizes  K,    K  locks  in  releasing  G  and  in 
turn,  N,  and  the  solution  proceeds.    This  mode  of  operation 
can  be  used  to  find  all  of  the  solutions  to  the  given  problem, 
rather  than  just  the  first  one. 


C.  E.  SHANNON 
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Appendix  A 
Main  Components  and  Their  Functions 


Relays  and  Other  Electromagnetic  Components 
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Memory  relays.    These  register  the 
values  of  the  function  read  off  the  tape 
for  its  sixteen  possible  states.    If  M. 
is  operated,  the  function  is  closed  in 
state  ie 

Four  parallel  relays  (to  give  sufficient 
contacts).    These  relays  negate  the  vari- 
able ¥  of  the  tape  function.    This  is 
done  in  the  negating  and  permuting  net- 
work by  interchanging  the  eight  leads 
corresponding  to  the  variable  W=l  with 
the  corresponding  eight  leads  for  which 
the  variable  W  is  zero. 

Similar  negating  relays  for  the  variable 


Similar  negating  relays  for  the  variable 
Y. 

Similar  negating  relays  for  the  variable 
Z. 


Permuting  relays.    The  function    of  these 
relays  is  to  permute  the  sixteen  lines 
from  the  memory  relays  according  to  the 
various  permutation  of  the  variables 
W,  X,  Y  and  Z  in  the  tape  function.  By 
suitable  combinations  of  operation  and 
release  of  these  five  sets  of  relays, 
the  interchanges  corresponding  to  any 
of  the  twenty-four  permutations  are  pos- 
sible. 

WZ  relays  arranged  in  a  counting  circuit 
to  go  through  the  384  permutations  and 
negations  applied  to  the  sixteen  leads 
in  the  permuter.    These  WZ; pairs  control 

the  preceding  W,  X  E  relays,  thus 

W1,  W2,  Wj,        are  controlled  by  the 

relay  of  the  ¥w  pair. 
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F0»  F7»  Fg»  F15  Failure  relays.    Operation  of  FQ,  for 

example,  corresponds  to  failure  of  the 
permuted  line  coming  into  switch  0  to 
match  the  value  on  input  switch  Iq. 

Operation  of  a  failure  relay  causes  the 
machine  to  proceed  to  try  another  per- 
mutation or  tape  function. 

F!>  F2  These  are  failure  relays  which  are  op- 

erated by  a  failure  to  match  on  any  of 
the  other  switches  not  taken  care  of 
specifically  by  FQ,  F7,  Fg  or  F^. 

F3»  Fg»  F-i6  Secondary  failure  relays.    These  are 

y  operated  by  the  preceding  failure  relays 

and  sort  out  the  type  of  short  cut  (if 
any)  available.    F^  causes  the  permuter 
to  advance  to  the  next  negation  (skipping 
all  permutations  of  the  current  negation)* 
F^  causes  the  permuter  to  skip  the  current 
subset  of  six  permutations  out  of  the 
twenty-four,  advancing  the  AB  part  of  the 
permutation  one  unit.    F^  causes  an  ad- 
vance of  only  one  in  the  permutation. 

a0»  Ri»  R2'  fi3»  Rs  These  relays  are  controlled  by  the  five 

fingers  of  the  tape  reading  mechanism. 
For  example,  a  hole  in  the  2  row  of  the 
tape  operates  R2#    Rq,  R^,  R^,  R~  carry 
information  to  the  memory  relays  Mq, 
and  also  to  the  number  of  state  re- 

la7s  Yi>  v2»  V4»  YB°    Es  marks  the  end 
of  data  relating  to  one  function  on  the 
tape . 

Sl*  S2*  S3*  SL  Steering  relays.    These  relays  steer, 

by  means  of  four  trees,  the  tape  read- 


ings on  Rq,  R^,  R2,  R3  into  the  memory 
relays  and  the  number  of  state  relavs 
Vl>  V2»  V  V 
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V»r  \2z-2.  S^^^fi^rSUlHr. 

sequence  the  steering  for  successive 
lines  of  tape  into  the  appropriate 
memory  and  number  of  state  relays. 

V,,  V"2,  V  ,  Vg,  Vl6  Number  of  state  relays.    These  relays 

*  register  in  binary  form  the  number  of 

states  for  which  the  function  currently 
in  the  memory  relays  is  closed. 

WSZS  A  WZ  pair  for  operating  the  card  dis- 

play unit.    It  causes  successive  func- 
tions on  the  tape  to  operate  alternately 
the  right  and  left  solenoids  Sr  and  S, 
of  the  display  unit. 

Sr,  Eight  and  left  solenoids  of  the  display 

unit  for  releasing  cards  one  by  one 
from  the  stack. 

H  End-of-permutations  relay.    This  oper- 

ates when  the  machine  has  tested  all 
permutations  of  the  current  tape  func- 
tion, and  initiates  analysis  of  the 
next  function  on  the  tape. 

I»  Success  relay.    This  operates  when  the 

machine  finds  a  solution  to  the  prob- 
lem. 

Q  Short  cut  eliminator.    "When  operated, 

this  relay  eliminates  short  cuts  in  the 
premutation  sequence. 

J  A  delaying  and  checking  relay  in  the 

basic  closed  loop  of  the  system.  J 
operates  when  all  of  the  WZ  pairs  in 
the  permutation  counter  are  in  agree- 
ment. 

SO  Slow-operate  relay  in  a  buzzer  circuit 

for  producing  pulses  to  step  the  tape 
via  relay  U. 

U  Secondary  relay  operated  by  SO. 
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Reed  relay  in  a  slow  relaxation  os- 
cillator circuit  for  controlling  low- 
speed  operation  via  secondary  relay  H. 

Secondary  relay  controlled  by  G. 

Control  relay  relating  to  low-speed  and 
self -restarting  modes  of  operation. 

Message  register  for  counting  solutions 
to  a  problem. 

A  relay  for  connecting  the  110  volt 
supply  only  when  the  24  volt  supply  is 
on. 

A  bell  operated  by  L  which  sounds  when 
a  solution  is  found. 

A  five-hole  teletype  tape  transmitter. 
The  standard  functions  are  arranged  on 
tape  in  order  of  increasing  numbers  of 
contacts. 


Appendix  B 
Manually  Operated  Switches 


Problem  input  switches.    These  switches 
have  three  positions,  "open,"  "donTt 
care,"  and  "closed,"  and  are  set  to  cor- 
respond to  the  desired  characteristics 
of  the  circuit  to  be  designed  in  its 
sixteen  states. 

Mode  of  operation  switch.    This  is  a 
five-position  switch  which  determines 
the  mode  of  operation  of  the  machine. 

In  clockwise  order  these  modes  are: 

- 

P  =  Periodic.    It  continues  cycling 

through  the  same  permutations  with- 
out advancing  to  the  next  function. 

Q  =  Step-by-step.    In  this  mode  the 
machine  tests  the  permutations  one 
at  a  time  under  control  of  the  key 
switch.    This  switch  must  be  pressed 
once  for  each  permutation. 

N  =  Normal  operation.    Runs  at  regular 
speed  to  the  first  solution  and  then 
stops. 

S  =  Self -re starting.    At  each  solution, 
it  rings  the  gong  and  adds  a  count 
to  the  message  register,  and  then 
advances  to  the  next  solution, 

L  =  Low-speed.  Similar  to  normal,  but 
at  low- speed  for  demonstration  and 
test  purposes. 

Short  cut  eliminator.    In  the  "On"  po- 
sition this  switch  operates  relay  Q 
and  eliminates  short  cuts  in  the  per- 
muting sequence. 

Next  function  button.    Pressing  this 
pushbutton  operates  relay  H,  causing 
the  machine  to  advance  to  the  next 
function  on  the  tape,  omitting  any  re- 
maining permutations  of  the  current 
function. 
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Appendix  B.  (Continued) 


Starts  the  machine  operating  by 
closing  its  fundamental  operating 
feedback  loop. 

Turns  power  on  for  the  machine. 

Both  of  these  switches  have  seventeen 
points  labeled,  0,  1,  2,  16;  the 

Min  switch  has  an  additional  point 
labeled  "Normal".    In  use,  the  Min 
switch  is  set  at  the  number  of  states 
for  which  the  function  to  be  designed 
is  closed.    The  Max  switch  is  set  at 
this  number  plus  the  number  of  "donft 
care"  states.    The  machine  then  skips 
functions  from  the  tape  whose  number 
of  closed  states  do  not  lie  in  this 
range,  thus  shortening  the  solution 
time.    If  the  Min  switch  is  set  at 
"Normal"  this  shortening  feature  is 
eliminated. 
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Both  experience  and  intuition  suggest  that  a  function 


of  time  f(t)  which  is  bounded  in  amplitude  range  (  |f (t)  |<A)  and 
in  bandwidth  (the  spectrum  vanishes  for  angular  frequencies 


etCo,  and  that  there  is  a  certain  minimum  time  required  to  go 
from  a  maximum  negative  to  a  maximum  positive  amplitude.  In- 
deed, one  feels  that  the  maximum  slopes,  and  higher  derivatives, 
and  the  fastest  rise  times  will  occur  with  a  sine  wave  having 
the  highest  allowed  amplitude  and  the  highest  allowed  frequency. 
This  note  establishes  some  theorems  of  this  general  sort. 

Theorem  I:    Let  the  function  f(t),  of  integrable 
square,  be  both  amplitude  limited  and  band  limited: 

|f(t)|<A  all  t 


greater  than  <aQ)  has  bounded  slope,  a  bounded  second  derivative, 


F(»)  -  0 


where  F(«)  is  the  Fourier  transform  of  f(t)0  Then 


f»(t)  <  A«0 
f"(t)  <  A«0 


2 


all  t 


f^t)  <  Ao)0] 


n 


2 


Proof;    If  we  can  prove  the  theorem  for  a  particular  t, 
it  will  follow  for  all  tr  since  we  can  shift  f(t)  along 
the  time  axis  without  affecting  the  assumptions  of  the 
theorem  or  its  conclusions.    We  will  prove  the  theorem 
for  the  particular  time  t^  -  Now  apply  the  sampling 

theorem  of  f(t),  expanding  it  in  terms  of  its  samples: 


f(t)  -  2    aj,  sin  Sa£ 
-oo  <i)0t-nn 

ft(t)  m  °P     ;[<o0((o0t-nn)cosco0t  -  <o0sinco0t] 

-oo  2 
(w0t  -  ntr) 

■ 

since  the  absolute  value  on  a^  makes  all  terms  positive. 

Now      is  the  value  H£  f  (t)  at  t  -  §2  ^  consequently 

o 

l^l  5  A»  Hence 


o   ~  **{n-l/2)< 


±Zfl     2  1 

(n-l/2)2 


This  proves  the  desired  result  for  the  first  derivative. 
The  results  forl.higher  derivatives  can  be  obtained 
inductively,    f» (t)  is  band-limited,  of  integrable  square,  and, 
as  we  have  just  shown,  amplitude  limited  to  AaiQ,    Hence,  f" 
will  be  amplitude  limited  by: 

f»{t)  <  (Awo)<o0  -  Ao)Q2 
and  by  obvious  induction 

f<n)(t)  <  A£0o*> 

It  will  be  noted  that  these  bounds  are  the  maximum 
derivatives  that  would  be  obtained  for  a  sine  wave  of  the 
highest  allowed  amplitude  and  frequency,  f(t)  «  A  sin  o>ot. 
While  such  a  wave  does  not  satisfy  our  integrable  square  as- 
sumption, it  is  possible  to  approximate  the  bounds  given  as 
closely  as  desired  by  taking  a  sine  wave  of  nearly  top  fre- 
quency and  nearly  top  amplitude  and  multiplying  it  by  a  very 

slowly  decaying  function  of  the  type  s***kt  (k  very  small), 

let 

This  produces  a  function  satisfying  all  the  conditions  with 
maximum  derivatives  approximating  to  the  upper  bounds  given. 
Consequently  these  bounds  are  the  best  possible. 

- 

We  now  consider  the  problem  of  total  rise  of  a  function 
over  an  interval.    Again  we  would  conjecture  that  the  shortest 
time  for  a  rise  from  negative  peak  to  positive  peak  amplitude 


would  be  obtained  by  use  of  a  sine  wave  of  the  greatest  allowed 
frequency  and  amplitude  and  hence  would  be  ntoQ  seconds.    We  have 
not  been  able  to  prove  a  result  quite  this  good  but  will  show  the 
following: 

Theorem  II:    Under  the  same  conditions  on  f (t)  as  in 
Theorem  I,  it  takes  at  least  3  1/12  w0  seconds  for  f (t)  to 
change  from  -A  to  +A. 

Proof:    We  will  show  that  if  f(o)  -  -A,  and  f(t3)  -  +A, 

then  f1  it)  for  0  <  t  <  t_  lies  always  under  or  on  the 

~     ~  3 

curve  g(t)  shown  in  Figure  1,    This  curve  consists  of 

2 

five  sections,  a  straight  line  segment  of  slope  Au3Q  ,  a 
parabolic  segment  whose  second  derivative  is  -Aa)0^  and 
which  is  tangent  to  the  first  segment  and  to  the  third 
segment,  a  horizontal  straight  line  at  height  Ao)Q.  The 
last  two  segments  are  reflections  of  the  first  two. 

In  the  first  place,  if  f(o)  -  -A,  then  f'(o)  -  0, 
for  f (t)  is  an  entire  function  because  of  the  band  limita- 
tions, and  if  £} (o)  were  not  equal  to  zero,  f(t)  would  run 
outside  its  amplitude  limit  A  in  the  neighborhood  of  zero. 
Now 

t 

f»(t)  -  f»(o)  +  J  f"(t)dt 

<  0  +  J  |f«(t)  |dt 
t 

<  Aw02  dt  -  A»02t  . 
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Hence  f 1 (t)  lies  under  or  on  the  sloping  straight  line 

section.    Also  f»(t)  <  AVjj^so  it  lies  under  the  horizontal 

segment.    Next  we  show  that  it  cannot  lie  in  the  small 

triangular  shaped  region  T.    Suppose  in  contradiction 

that  f 1 (t)  did  lie  in  this  region,  passing  through  a  point 

p  at  t  -  t    as  shown.    At  tQ  we  have  either  (A)  f"(to)  >  gT(t0) 

or  m  f°(t0)  <  g'(t0). 

Assume  first  case  (A).    We  may  write 

t2  t2 

f»(t2)  -  f'(tQ)  +  (t2-t0)  f»(t0)  +    J   I    f«'(t)dt  dt.  (1) 

*o  *o 

We  also  have 

t2 

g(t2)  -  g(t0)  +  (t2  -  tQ)  g«(t0)  +  J  J  g»(t)  dt  dt.  (2) 

The  three  right-hand  members  of  (1)  dominate  the  corres- 
ponding members  of  (2).    f»(te)  >  g(t0)  since  we  assumed 
f»(t0)  in  the  triangular  region.    f?(t0)  >  g»(te)  since 
we  are  assuming  case  (A).    fm  (t)  >  g"(t)  since  the  g  curve 
has  the  greatest  negative  second  derivative  allowed  by 
Theorem  I.    We  conclude  that  f'(t2)  >  g(t2),  and  the  f» 
curve  is  over  the  horizontal  line  at  t2,  a  contradiction 
which  excludes  case  (A). 

A  similar  argument  applies  to  case  (B)  working  back- 
ward to  the  point  t^«    In  equations  (1)  and  (2),  read  t± 


for  t2  and  notice  that  the  coefficient  (t1-tQ)  now  becomes 
negative •    This  allows  the  same  argument  to  go  through  with 
the  condition  reversed  on  the  relation  of  f"(t0)  and  gT(to), 
and  the  resulting  contradiction  excludes  case  (B),  which 
shows  the  impossibility  of  a  curve  in  the  triangular  region. 
An  exactly  similar  argument  working  backward  from  t0 


shows  that  f»(t)  must  lie  under  or  on  the  right-hand  sloping 
line  and  curved  segment.    Now  if  f»(t)  is  always  under  g(t) 


under  gH).    In  order  that  f(t)  run  from  -A  to  0  to  +A  at  t^ 
the  area  under  f « (t)  must  be  at  least  2A  and  hence  so  must  that 
under  g(t).    A  simple  integration  of  the  g(t)  curve  shows  that 
this  requires  t3  >  3  1.    This  proves  the  desired  result. 


It  would  no  doubt  be  possible  to  improve  the  value 
3  ^  by  more  elaborate  arguments  of  the  same  general  type, 
finding  better  g(t)  functions  with  properly  banded  values  of 
gm  (t),  giv(t),  etc.    It  seems  difficult  however  to  obtain  the 
conjectured  value  by  this  method, 

C.  £.  SHANNON 
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Theorem: 

In  a  discrete  noisy  channel  without  memory,  the  rate  of 
transmission  R  is  a  concave  downward  function  of  the  probabilities 
Pi  of  the  input  symbols.    Hence,  any  local  maximum  of  R  will  be 
the  absolute  maximum  or  channel  capacity  C. 
Proof;    We  have 

R  =  B(y)  -  Hx(y) 
-  -2  QA  log  Qi  +  2 

where  the  Q.^  are  the  probabilities  of  the  various  received  symbols 
and  a£  is  the  conditional  entropy  of  the  received  symbol  when  the 
transmitted  symbol  is  the  i-th  one. 

A  condition  for  concavity  of  R  is  that         —  =  R.. 

be  a  negative  semi-definite  form.*  We  have 

|f   -  -f  1  ♦  log  9i)  PjU)  ♦  a} 

J 

using  the  fact  that  Qi  =  Zp^p^i). 

H<v  -  ~Z  -       i  p,(i)  p  fi) 


*See  "Inequalities, "  Hardy,  Littlewood  and  Polya,  Cambridge  1934, 
p.  SO. 


2  R     AP.AP.  =  -2  2  Ip.(i)  p  (i)  AP  AP 
jk    J       J    1       ijkQi    J         k  j  k 

,-2^(2P.(i)AP.)(Z  pk(i)APk) 

(1) 

-  j£Si. 
iQi 

This  displays  the  sum  as  necessarily  non-positive,  since 
all  terms  are  non-positive,  and  consequently  shows  that  R^k  is 
negative  semi-definite  and  R  a  concave  function.    The  simplicity 
of  the  formula  (1)  for  the  second  derivative  of  R  in  an  arbitrary 
direction  is  quite  striking. 

A  corollary  to  this  result  is  the  following!  Consider 
the  set  s  of  points  (Plf  ?2,  PQ)  with  2Pi  -  1  for  which  R 

has  its  maximum  value.    Normally,  of  course,  there  is  only  one 
point  in  the  set,  but  in  other  cases  it  is  not  so  limited.  Our 
theorem  allows  us  to  deduce  that  s  is  always  a  convex  set  of 
points,  for  if  R  is  maximized  at  (P^,  Pn)  and  also  at 

(P»,  Pf),  it  must  clearly  have  the  same  value  at  (aP  + 

in  -1- 

(l-a)PJ  aPn  +  (l-a)Pjr). 
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A  SKELETON  KEY  TO  THE  IBFQRKftTION  SEMIHAB  -  gOTES 


The  material  in  these  notes  has  not  for  the  most  part  been 
published  and  is  for  personal  use  only.    The  notes  are  not  complete. 
Several  key  sections  are  not  yet  available,  consequently  there  are  a 
number  of  forward  and  backward  references  which  are  quite  meaningless. 


The  remaining  sections  will  be  handed  out  as  soon  as  avail 

The  parts  of  the  notes  now  available  are  not  arranged  in  the 
correct  order  for  easiest  reading.    The  following  rearrangement  of  sec- 
tions should  be  made: 

Some  Useful  Inequalities  for  Distribution  Functions    -  p.  la  -  3a  ^ 

A  Lover  Bound  on  the  Tall  of  a  Distribution     -       p.  ly  -  9y  u-^ 

A  Combination  Theorem  p.  lm  I — 

Some  Results  on  Determinants  p.  lb  -  3b 

Upper  and  Lower  Bounds  for  Powers  of  a  Matrix  with  Hon-negatl,ve  Elements 

The  ffumber  of  Sequences  of  a  Given  Length 
Characteristic  for  a  Language  with  Indepedent  Letters 
The  Probability  of  Error  in  Optimal  Codes 
Page  with  figures  1,  2  and  3 

Zero  Error  Codes  and  the  Zero  Error  Capacity  p.  I4-  6g  ^ 

Theorem  p.  lh  -  3b.  U<- 

Figure  4   

Lower  Bound  for  Ppf  for  a  Completely  Connee^*  Ch«nn?T  yi^ 

p.  2r  -  3r 

ad  for  f&  p.  lk  -  5k 

Application  of  ■Sphere-packing"  Bounds  to  Feedback  Case    -  p.  lp  -  3p 
Theorem  p.  lq  -  4q^ 

Theorem  p.  1J  ^ 

A  Result  for  the  Hemoryless  Feedback  Channel  p.  ir  \^ 

Continuity  of  Pp  ppt  as  a  function  of  transition  probabilities  -  p.  le 

Codes  of  a  fixed  composition  p.  If 

Relation  of  P^  to  n  .  It  -  2i 

BpUBl  or  Pg  for  Random  Code  by  Simple  Threshold  Argument  -  si  -  eki^ 

A  bound  on  Pe  for  a  random  code  p.  Id  -  3d  ^ 


-  2  - 

The  Felnstein  Bound  pages  11  &  21 

Relations  Between  Reliability  and  Minimum  Word  Separation    -  p.  l2  (  22  ,  62  &  72 

Inequalities  for  Decodable  Codes  p.  In  -  Jn 

Convexity  of  Channel  Capacity  as  a  Function  of  Transition 

Probabilities  p.  lo  L*-" 

A  geometric  Interpretation  of  Channel  Capacity  p.  lx  -  6x  ^ 

Log  Moment  Generatin  Function  for  th»  Sqpm-e  of  a 

Quassian  Yariate  p.  p  1  -  £2  L- 

TTppar  Bound  oix       for  Gaussian  Channel  by  Expurgated 

'  Random  Code  p.    si  -  f2 

Lower  Bound  on  P^  in  Gaussian  Channel  by  Minimum 

Distance  Argument  p.    al  -  a2  " 

The  Sphere  Packing  Bound  for  the  Gaussian  Power 

Limited  Channel  p.     c  1  -  e  5 

The  T-terminal  Channel  p.      .fl  -  67 

Conditions  for  Constant  Mutual  Information  p.  1066 

Simple  Proof  p.  1024 

The  following  errata  have  been  found: 
p.  ly   line  10  >  1 

line  11    for  any  positive  <^ 

line  14        ^(1  -  ep"- 

p.  2y    line    8     V,  <Y2  <. . . .  7% 

p.  Jw  -  lines  1.  2.  4,  7,  8,  9,  13.  17  subscripts  on  $  should 
be  in  line. 
1 

p.  2c  -  line  7      *   log  Prob 

n  p 

4c  -  Eq.  (7)     E(8)  -  -^(s)  log  -    -  (ji  -  su«) 

Eq.  (8)      R(s)  =  £^(8)    log  qi(s)°1  »  n  -  («-l) 

line  6     dE  ,  dR  ^  -  n'  +  six"  +  n'    -   .  s 
ds  '  ds  *     n1  +  (1-s)  uM-u'  ~ 

line  2     E(l)  «  j        log  p^1    +    log  d 
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page  3|  -  line  3       -  log   min.  jT 

page  J*g  -  line  9     change  mar.  to  min. 
Fig.  4   bottom  line    -  change  3  to  2. 
page  5K    equation  (l)  min 

V  1°U  =  1 

I  would  appreciate  knowing  any  further  errors  of  any  sort  that 
are  found  in  the  notes.    I  expect  there  a  good  many  there.    I  wculd 
also  be  interested  to  know  of  any  parts  that  are  particularly  difficult 
to  follow  and  perhaps  need  rewriting. 


Claude  E.  Shannon 
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Bounds  or-  the  Teiis  of  Martingales  and  delated  Questions 

Claude  B.  Shannon 
Department  of  Electrical  Engineering 
Department  cf  Mathematics 
and 

Eeseareh  Laboratory  of  Electronics 
Massachusetts  Institute  of  Technology 
Cambridge,  Massachusetts 

This  paper  is  concerned  with  the  problem  of  overbcunding  the  proba- 
bility that  the  sum  of  n  dependent  random  variables  exceeds  a  certain 
quantity.     Certain  restrictions  are  assumed  concerning  the  distribution 
of  the  ith  random  valuable  :n  conditional  on  the  preceding  random  var- 
iables. As  an  example,  v;e  might  have  a  gambler  plgying  some  *  system K 
in  v/hieh  m  is  his  winning  cn  the  ith  bet.     Suppose  he  can  choose  any 
distribution  he  desires  for  xi  conditional  on  the  preceding  plays, 
-"'^i i 3:j-~ "  "  i-Z'  subject  however  to  the  conditions  1)  it  is  a 

KfairK  bet,  S(x.  !x.._,  ,  . . . ,  r^)  =  Oj  2)  there  is  a  Rhouse  limit"  on 

passible  wins  or  losses  for  one  bet,  . .  .,x, )  =  0  for 

<  L  and  Pixja^, n^gt  . x^  *  1  for  sc..  S>  W.   It  is  desired  tc  find 
an  upper  bound  on  the  probability  that  the  gambler's  winnings  will  exceed 
a  certain  limit  X  after  n  bets.    This  bound  will  of  course  be  a  function 
of  Le  Y:s  n  and  K  but  is  to  be  independent  of  the  system  used. 

Thought  of  another  way,  we  can  imagine  the  gambler  mapping  out  a 
strategy,  subject  to  the  house  rules,  to  try  to  maximize  the  probability 
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of  ending  up  after  n  bets  with  a  total  winning  of  X  or  more.   If  this  is 
his  object,  he  would  clearly  be  wise,  for  example,  if  he  ever  reached 
the  level  X  to  not  risk  any  future  loss.    This  he  could  do  by  choosing  a 
distribution  function  thereafter  which  is  0  for  negative  s  and  1  for 
positive  s. 

We  will  find  a  bound  for  this  problem  and  various  other  similar 
problems  with  different  side  constraints  on  the  allowed  distribution 
functions.    The  results  have  applications  in  various  problems  related 
t-  random  walks,  gambler's  ruin  problems  and  certain  coding  problems 
in  information  theory. 

In  the  example  above,  the  gambler's  total  capital  forms  a  martingale 
because  of  the  Rfair  bet"  condition.   Bounds  on  the  tails  of  martingales 
are  known  in  terms  of  the  variances  of  the  successive  amounts  won. 
The  bounds  we  obtain  are  in  terms  of  conditional  moment  generating 
functions.   As  such,  they  require  more  in  the  way  of  restrictions  on 
the  distributions'  (for  the  moment  generating  functions  to  exist),  but 
give  tighter  bounds.   Our  bounds  bear  the  same  relation  to  the  variance 
type  bounds  for  martingales  that  the  Chernoff  bound  does  to  the 
Chabycheff  bound  for  sums  of  independent  random  variables. 

The  Main  Inecuality 

The  method  we  use  is  based  on  a  bound  for  the  tail  of  a  distribution 
due  to  Chernoff^'.   Lei  P(x)  be  the  distribution  function  of  a  random 

res 
eS::dP(x) 

exists  ever  some  %  interval  including  the  origin  in  its  interior.  This 
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will  certainly  b'e  true,  for  example,  if  P(x)  <  eE::  for  some  a  >  0  and 
sufficiently  large  negative  x,  and  1  -  P(s)  <  e  for  Some  positive  b 
2nd  sufficiently  large  positive  x. 

We  first  derive  a  somewhat  generalized  formulation  of  the  Cherncff 
bound.   Let  u(s)  *  log  v(s)  be  the  semi-invariant  generating  function. 

Lemma  1;  Suppose  the  semi-invariant  generating  function  {i(s),for  a 
random  variable  x,  exists  for  &  <  s  <  b  and  does  not  exceed  another 
differentiable  function  of  st  ^(s).    Thus  f/.£s)  *  !-Us).  Then 

fi  (s)-S:-f{s) 
Pr[:^r,ys)l  «  e  °        "  °  b^s>0 

r-r[;:^(Sjj  <s  e  °  °  -e  «  s  «  G 

This  result  is  like  the  Chernoff  bound  except  for  replacement  of  u(s} 
by  an  upper  bounding  function  ^(s),  and  may  be  proved  by  similar  means. 
Thus  by  the  generalized  Chebycheff  inequality 

sy  /*cc 

e 


XPr[x5*X]  «  f  "  eS"dP(x)         s  *  0 


:  f°°  c-sxdP(x)  =  v(s)  =  e^s} 

v-00 

Ms) 


*e  ° 


his  is  true  for  any  X.   Set  X  =  h£(s).  Then 


e  ° 


A  similar  argument  gives  the  dual  inequality  for  negative  ». 

We  now  develop  a  formula  for  the  momeat  generating  function  of  the 
sura  of  c  set  of  dependent  random  valuables  }x  =  X]  *  ^  +  . . .  f  ^  ,vhere 
the  distribute  function  of  r_,,  ...,Zr  is  given  by 

P(z  I'  V  ' '  * '  *n}  "  F*i^2.  s2*«y  ....  s.r<aj 

It  is  cs  assumed  that  for  this  multivariate  distribution  the  moment  gen- 
err irzZ  functus  for  various  random  variables  conditional  on  others 
euisi.    To  avoid  notations!  eomp-emty  we  carry  out  the  only 
for  n  -  *,  using  ;:,  y  and  *  for  the  three  random  variables,  but  the 
method  is  clearly  general,   id  v(s)  is  the  moment  generating  function 
for  the  sum  variable  u  «  s  +  y  *  2,  then  (all  integrals  are  from  -co  to  »); 

=  /  eS::  dP(r)  J        dPV|^3  j*  eSZ  dP(s|Xj  y) 

The  innermost  integral  is  the  moment  generating  function  for  s  condi- 
tional on  s  and  y.  and  may  be  denoted  by  v^.y)  (the  3  referring  to 
the  third  variable,  z).  Thus 


Suppose  now  that  we  have  a  bounding  function  for  ^(efx.y),  say  Y  (s). 
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independent  of  x  and  y. 
v3(s|x,y)<  Y3(S) 

Then  the  innermost  integral  may  be  bounded  by  ^(3)  and  .Ms  term 
taken  out  of  the  integration.         is  ciearly  non-negative.  being  an 
expectation  of  eSz.)  Thus 

Ws)^v3(s)  J  eSidP(:0  JeSydP(y[x) 

Similarly,  suppose  the  moment  generating  function  of  y  conditional  on 
x  is  bounded  by  y  (s) 

v2(s|x)=  j"e^dF(7Jx)^  Y2(S) 
and  the  moment  generating  function  of  x  is  bounded  by  Yl(s) 


eSx 


dP(x)  <  Yl(s) 

Then  these  may  also  be  ,sed  to  bound  the  integrals,  giving 
WiJ  «  Yj(s)  v2(sj  y3(s) 

Taking  logarithms  the  semi-invariant  generating  function  u(s)  for  ' 
the  sum  variable  u  is  therefore  bounded  by  the  sum  of  the  logarithms 
of  the  v<s)  functions,  iiat  is,  by  uniform  bounds  on  the  conditional  semi- 
invariant  functions  fo  the  different  variables 

l4s)  £  ^(s)  t  ,i2(s)+  ^(s) 
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The  same  argument  carries  through  for  the  sua  of  any  number  of  ran- 
dom variables  and  may  be  summarised  as  follows. 

Lemma  2:  The  semi«invariant  generating  function  jj.(s)  for  the  sum 
of  n  random  variables  is  bounded  by 


where  ^(s)  is  a  uniform  bound  on  the  Semi-invariant  function  for  the 
ith  random  variable  conditional  on  the  first  i—  i; 


f  sx. 

log  J  e    1  dP(x.  |slf  s2,  ....  s^j)  *  (j..(s) . 


In  most  applications  the  same  bound,  say  p.Q(s)s  will  apply  to  all 
the  random  variables.   In  this  case  ^(s)  <S  ntiQ(s).   Combining  Lemmas 
1  and  2  we  obtain  our  first  main  result,  a  bound  on  the  tail  probability 
of  a  sum  of  dependent  random  variables  provided  the  conditional  moment 
generating  functions  exist. 

Theorem  1:  If  u  is  the  sum  of  n  dependent  random  variables 
Xj(i*l,  Z,  . . . ,  n)  whose  semi -invariant  generating  functions  conditional 
on  preceding  variables  n^sjxj,  ....         exiist  and  are  bounded  by  dif- 
ferentiate functions  ^(s),  (i=l#  2,  . . n)  then 

Pr[u*Su|(s)j  «  e    1  1  s  *  0 

Sti.(s)-sZ}i'(s) 
Pr[u«2|^(s)3  ^  e    x  1  s  <  0 
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Applications 

In  applications  of  this  result  we  would  generally  attempt  to  find  the 
smallest  bounding  functions  ^(s)  in  order  to  obtain  the  tightest  bound  on  the 
tail  probability.  As  a  first  example  consider  a  gambler  allowed  to  choose 
a  wager  with  an  arbitrary  distribution  function  ctfx)  (the  probability  of 
gaining  x  or  less),  subject  however  to  the  following  conditions: 

1)  The  expected  gain  is  2ero.  J" xd$x)  ~  0 

2)  <Kx)  =s  ^(x)  where  ^(x)  is  a  distribution  function  with  negative 
mean  for  which  J%Sx  d^(x)  exists  for  some  negative  s. 

3)  <Kx)  5>  <>2(x)  where  <j>2(x)  is  a  distribution  function  with  positive 
mean  for  which  /esx  d<|>2(x)  exists  for  some  positive  s  and  ^(x)  <  ^(x). 

Thus  our  gambler  is  allowed  to  choose  a  distribution  function  at 
each  wager  lying  between  two  given  curves  ^(x)  and  ^(x)t  (as  suggested 
by  Fig.  1) 


Fig.  1. 


which  approach  0  and  1  with  a  certain  rapidity.   He  is  also  constrained 
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to  choose  a  distribution  function  with  zero  mean.    The  situation  described 
earlier  involving  house  limits  is  a  case  of  this  type  where  the  distribu- 
tions $j  and  4>2  are  step  functions  at  L  and  W,  the  maximum  allowed 
loss  or  win  per  wagar. 

To  apply  the  theorem  we  need  a  function  which  bounds  the  moment 
generating  functions  which  he  can  achieve  with  these  restrictions.  Con- 
sider the  distribution  function  A  (s)  defined  as  follows: 

t>G(")  ~  $j(x)         x  <  a 
<?>0(x)  ■  k  a  =S     ^  p 

♦DU)  =  42(x)         x  >  p 


where  a  is  the  first  point  at  which  ^(x)  reaches  the  value  k  and  (3  is 
the  first  point  at  which  <J>2(x)  reaches  k.    tfx)  is  a  distribution  function, 
and  by  adjusting  k  we  can  clearly  make  the  mean  of  the  distribution  <j>(x) 
equal  zero.   With  this  value  of  k  we  will  show  that  the  moment  generating 
function  for  any  allowed  ${x)  is  bounded  by  that  for  A  (x). 

Since  $(x)  and  <|>o(x)  have  the  same  mean  (namely  zero),  we  have, 
integrating  by  parts, 

o  =  f   x  d(*o(*H<*))  =  ^(xHKx))]00   -  f°°  4  (x)-cKx})dx 

-00       e/~co    \  / 


dx  =  0 


where  we  use  the  exponential  approach  of  *  and  6q  to  0  and  1  as  x  goes 

to  -co  and  -fco  to  insure  the  vanishing  of  the  term  4*0UH(x»  at  these  limits. 
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Now  consider  the' quantit-f  fa~H«  „c. 

q      2  -  Us -in  using  integration  by  parts) 

£  °"  «*«HM>  -  -  •  f  e-  to 

A  0 


-a  «  s  «  b 


a  md  b  ^e  shs  iimi££  of  ^^^^^^  ^  ^ 
'unctions  an,  a  is  *.  «rSt  paiat  8t  ^  ^  ^  ^  , 
horizontal  se^nt  of  ^,    ^  ^  foy  ^  ^      ~  ^ 

;7tly* (w  — v?- £- »>••    -  «  u     (or  ,.ro, 

I  he  first  terrn    -s  /      «,s*r.L  r  i    ,  „ 

.  r6  J-o  is  greater  than  or  equal  to 

_  A*  S    **J*H<*ndx.  since,  when  s  is  positive,  e*5>  es*  for 

<  x  <  6,  $   -  $  is  positive  and  the  coefficient 

-Y  UJCien£    s  »  negative.   If  s 

posltlve.  fa  „  stoUar  way_  lhe  aecQnd  . 

*  neater  than  or  equal  l0  _3  p  ^        "  "      J6     ^    ^HMl  <* 

J6         IV1'-^2)!      as  one  verities  by 
examxnation  of  the  two  cases  s  »  o  and  s  <  0  „       „  • 

Q  s  *      remembering  that  4  (x)  - 
»  native  or  ,.ero  in  this  range.  Thus  we  concha  ° 

=  -se5s  r 


cs 

e6S[*0(xh«(s)J  dx 


■se 


a  0 
•'-co 
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In  other  words,  the  moment  generating  function  for  the  distribution 
6  (x)  dominates  that  of  any  other  distribution  with  the  same  mean  as  <j> 
and  bounded  by  the  ^  and  <j>2  curves.   Therefore  the  moment  generating 
function  for  A    may  be  used  in  our  bounds  for  the  tail  of  a  sum  distribu- 
tion  if  the  individual  conditional  distributions  satisfy  this  type  of  restric- 
tion. 

Using  this  bound  on  the  conditional  moment  generating  functions  in 
Theorem  1  our  solution  may  be  summarised  as  follows.    Suppose  at  each 
play  of  a  game  the  distribution  functions  available  to  a  gambler  all  have 
zero  mean  and  lie  between  two  functions  6j(x)  and  d>2(x).   Let  4>Q(x)  be 
the  zero  mean  function  consisting  of  4>j  followed  by  a  flat  segment, 
followed  by  4>2.  Let 

yis)  =  log  J°    e3x  d  $o(x», 

Then  the  probability  of  his  winnings  after  n  wagers  exceeding  n{x»(s)  is 

Pr[u»nu«(s}J  <  en[fi(s)"s^(s)]         s  »  0 

This  same  bound  applies,  of  course,  also  with  a  semi-martingale 
condition,  that  is,  if  the  gambler's  expectation  is  only  required  to  be 
non-positive. 

If  4^(0)  ■  1  and  <j>2(0)  =  0  (so  the  gambler  can  play  a  wager  that  amounts 
to  stopping  the  game,  that  is,  a  distribution  which  is  a  unit  step  at  zero), 
then  this  same  bound  applies  to  the  probability  of  exceeding  nn'(s)  on  any 
of  the  first  n  trials.    This  is  because  the  bound  covers  ail  strategies. 
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Any  particular  strategy  could  be  modified  so  that  if  the  gambler  reaches 
the  level  nfi'(s)  at  any  time  before  the  nth  trial  he  then  effectively  holds 
his  winnings  by  playing  the  distribution  with  unit  step  at  zero.   The  bound 
must  exceed  the  probability  of  exceeding  the  level  njxs(s)  for  this  strategy 
at  the  nth  step  but  this  is  a  bound  on  the  probability  of  ever  exceeding  the 
level  in  the  first  n  steps.   This  device  can  be  used  in  many  applications 
of  the  method  we  ar?  describing,  provided  only  that  the  unit  step  at  aero 
is  an  allowed  distribution  function. 

The  bound  given,  while  certainly  not  the  best  possible  for  all  values 
of  the  parameters,  is,  however,  best  possible  in  the  coefficient  of  n  in 
the  exponent.   That  is,  the  result  would  be  false  if  u<s)  -  su»(s)  in  the 
riyht  hand  exponential  term  were  replaced  by  \i(s)  -  sjj.6(s)  -  €  for  any 
positive  €.   This  may  be  seen  as  follows.   The  gambler  could,  within  the 
rules,  choose  the  distribution  $o(x)  at  each  wager.  If  he  does  so,  then 
we  have  a  sum  of  n  independent  random  variables,  each  with  semi- 
invariant  generating'  function  u(s).   Lower  bounds  on  the  tail  of  this  sum 
distribution  are  known  to  exceed  ^rf'Hu^H]  when  n  is  sufficiently 
large.^ 

The  Case  with  House  Limits  on  Win  or  Loss  for  each  Wager 

For  the  case  of  the  gambler  who  can  choose  an  arbitrary  distribution 
with  sero  mean  and  house  limits  on  wins  and  losses  W  and  L  (L<0) 
respectively,  the  distribution  to  maximize  ji(s)  is,  from  the  above  analysis, 
a  binomial  distribution  with  jumps  at  the  ends  of  the  interval  W  and  L 
adjusted  to  give  a  zero  mean.   The  two  probabilities  are  WWT    at  L  and 
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To  gain  a  little  in  generality  and  siinpiixy  notation,  consider  a  binomial 
with  probability  p  at  values  L  and  probability  q  *  {l-p)  at  W.    The  semi- 
invariant  generating  function  is 

n(s;  =  log  (pe?L*qesW) 

_T   sL  ,    „.  sW 
pLe     +  qWe 

uH&\  =  

pe      *  qe 

The  expression  for  the  bound  on  the  tail  may  be  simplified  by  a  change  of 
variables  eliminating  s.  Let 


nas>L 
pe 

X  a 


peSL  +  qesW 


qe 

t]     1  -  \  = 


sL  sW 
pe     *  qe 


Then 


A  *  L  cs{L-w, 
i  q 

i  *q 

H'(s)  =  XL  *  t]W 


12 


u  -  ap?(a)  =  log  (pesL*qesW)  -  s(XL^W) 


=  log  (pea^qesvv)- 


L  -  W    lQg  pi. 


p  q 

=  X  log     ♦  11  log  — 


Xq      (XL^W)  Xq 


Letting  p  equal  ^—  and  q  equal  ^rx"  and  using  our  result  bounding 
the  tail  of  the  sum  of  n  random  variables,  we  obtain  the  following  bound 
for  the  probability  of  the  gambler  exceeding  a  certain  level  after  n  wagers; 

W  (IF  f 


Pr[u»n(XL*T|W)]  <c 


W  -  L 


X  »  pi  tj  =  1  -  X 


If  L  =  -W,  that  isr  the  win  and  loss  limits  are  the  same,  this  formula 
can  be  simplified  somewhat  at  the  expense  of  a  certain  weakening.  It  then 


becomes 


Pr[u»nW(l-ZX)3  < 


"-X  -Tfn 
X     ti  1 


Let  x*  |(i+e}f  n  =  -ki-e). 


Then 


Pr[u>nW9]  *  [(Hwef(1+e)(1-e)-(l-e)]n/2 

-|[(l+e)  In  (116)1(1-8)  In  (1-6)] 


83  e 
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Consider  the  bracketed  term  in  the  exponent  and  expand  the  logarithms 
as  series. 

[(1+0)  In  (lfe)-f  (1-9)  In  (1-8)]  *  (l*0)(e  -  ^  *  ^  -        +  ) 


\i  o,y  a       2        3        4       .  .  „y 


e4  _  e°  \ 

*\    2        4        6  "  "7 

,f9z ,  e4  .  e6  , 


Q2    e4     e6  e2n 

'    b  ^   15  ^  °°°  ^  1i(2n-l}  * 

*02 

Hence 

-ne! 

Pr[u*nW9]  «  e    2  9  Ss  0 

It  may  be  noted  that  this  bound  is  similar  to  the  exponential  part  of 
the  normal  approximation  to  the  sum  of  n  binomial  samples,,  probabilities 
'£  at  t  Wp  without,  however,  the  coefficient  term  that  would  ordinarily 
appear.   This  might  suggest  that  the  gamblers  best  strategy  to  maximize 
the  probability  of  exceeding  nW8  would  be  to  continually  play  the  extreme 
binomial  distribution,  or  at  least  until  he  was  within  W  of  it  and  then 
switch  to  a  binomial  which  would  just  carry  him  oyer  the  limit  if  he  won. 
While  this  appears  to  be  a  rather  good  strategy,  it  is  not  quite  optimal 


h 
v  ■ 
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as  a  study  of  small  n  values  reveals.   Determining  the  optimal  strategy 
appears  to  involve  considerable  combinatorial  complexity. 

The  Probability  of  ever  exceeding  a  Limit  with  a  Negative  Expectation 

Suppose  now  that  the  conditional  expectation  of  all  wagers  Is  negative 
and  we  are  interested  in  a  bound  on  the  probability  of  ever  (in  an  infinite 
series  of  wagers}  exceeding  a  certain  (positive}  value.    If  the  expectation 
were  srero.  then  by  well  known  results  m  the  gambler's  ruin  problem  the 
only  bound  is  unity,  provided,  for  example,,  the  gambler  can  play  a  binomial 
distribution.   With  a  negative  mean,  however,,  significant  bound?  can  be 
obtained  as  follows. 

We  consider  the  case  again  where  the  allowed  distribution  functions 
must  lae  between  two  given  distribution  functions  «t>j(x)  and  4»2(x?  but  now 
must  have  a  mean  m  <  0.    The  maximum  n(s)  is  obtained  by  the  same 
construction  using  ^  and  <$>2„  but  with  a  placement  of  the  horizontal  seg= 
ment  to  give  the  mean  m. 

If  4>(0)  is  1„  then  4>2(0)  must  have  been  1,  and  no  allowed  bet  whatever 
will  ever  give  a  positive  return.    Thus  clearly  the  probability  of  ever 
exceeding  any  positive  bound  is  *,ero.   We  will  therefore  assume  that 
<K0}  <  1.    This  assumption  also  excludes  <Kx}  being  a  unit  step„  since  the 
step  would  have  to  occur  at  the  negative  number  m0  making  4»(0)  equal  I. 

Under  the  assumption  $(0)  <  1,  the  \i(s)  curve  has  the  general  form 
shown  in  Fig.  2. 
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The  curve  Is  convex  downward;  it  passes  through  zero  at  s  »  ©  with 
a  negative  slope  m;  it  has  a  unique  minimum  at  s  =  Sj  (say),1;  and  passes 
through  ?.ero  again  at  sq  >  sy   These  facts  follow  readily  from  the  rela- 
tions 


=  J  d<Kx) 


xesx  d<|>(x} 


vis)  «  J* 
r*(0)  =  J 

V(b)  =  f  x2esx  d<Kx^         jx^s)  .  vCs)  gfaj  -  vis) 


xd<Kx)  ■  m 


nts)  *  In  v(a) 
jt(0}  *  0 


»x1s)  *  ^ 
v(s) 


fi6{0}  *  m 


The  numerator  of  u^s}  is  positive  by  using  the  Schwartz  inequality 
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(the  unit  step  which  would  give  zero  being  excluded).   Hence  the  u  curve 
is  strictly  convex  downward.   Also,  for  sufficiently  large  positive  s, 
v(s)  will  exceed  1  and  tfs)  will  be  positive,  since  <j>(0)  <  1.  Conse= 
quently,  the  minimum  (lats^Sj  and  the  positive  sero  crossing  at 


s  ~  «o  both  exist. 


Suppose  we  are  interested  in  a  bound  on  the  probability  of  ever 

reaching  or  exceeding  A  with  the  sums  u,  -  x.,  u?  *  x.  4-  x 

1       1      Z       1  2 


^xn«  . . .  .    We  have 


f       °   •    a  %      U  " 

n 


Prfany  u   >A]<        Pr[u  >A] 


n 


From  our  above  results  Pr[u*A]  «  en^s^^  for  the  8  such  that 
A  *  n»i»(*).   The  particular  n  for  which  this  bound  is  largest  may  be 
obtained  by  maximizing  n[u(s)-sn'(s)]  given  A  =  nu'(s),  or.  in  other 
words,  maximising  A  jj^i  -  sj  .   Since  »»(•)  >  0  this  maximum  exists 
and  occurs  at  a  unique  s  found  by  differentiation,  namely,  the  s  for  which 
ji(s)  =  0.    This  s  is  the  so  of  Fig.  2.  and  the  corresponding  n  we  call 
nQ.   Thus  sQ  and  nQ  satisfy 

non,(so)  =  A 

In  general,  nQ  will  not  be  an  integer,  but  the  bound  obtained  for 
evaluation  at  nQ  and  sq  certainly  is  greater  than  that  for  any  integer 
points.   Hence  for  any  particular  n. 
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Now  consider  the  Sj  where  ^(Sj)  =  0  (Fig.  2)  and  n.  defined  by 

Again,  in  general,  ^  wiU  not  be  an  integer.  We  let,  however,  [Hj]  denote 
the  largest  integer  contained  in  nr 

Returning  to  our  inequality  on  the  probability  of  un  ever  exceeding  A 
we  may  rewrite  as  follows 

Pr[any  ur  2*  Aj  «*  £J  Pr[u  ^A] 

n 


E  Pr[u  *A]  +  £  Prfu  »A] 
n-1  [njHl 


00 


[njj+l 


<  n,e 


-n  s  u 


o      x  o,  e 


1  -  e  1 


<n,e    °  <>—  o'  + _g_ 


1  -  e 


<s  e 


a1  + 


-  e 


-nj^Sj) 


1  -  e 
1 


1 


n.  +  ; — r 


Pr[any  un  3*  A]  ==  e 


-s  A 


1  -  e 

1 


s  A 

1  -  e  1 


1  rt 


This  is  our  desired  bound.   It  is  essentially  exponentially  decreasing 
in  A.     in  fact  more  refined  analysis  can  be  given  to  show  that  the  bounded 
term  can  be  replaced  by  a  more  involved  expression  which  does  not  increase 
with  A. 
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Some  Ussful  Inequalities  for  Biatribvtion,  rjaptitsis 

In  this  section  a  number  or  inequalities  trill  be  riven  r»Msh  ere 
useful  in  estimating  the  "tails'1  of  distribution   functions  or'  ether 
related  statistics*, 

Binomial  Inaqra  litis?  s :  lat 

1  1 


(I) 


Then 


GBEp-Cj^+j^)  5(^)50  .  (2) 


anc. 


Trhere  t&  ••*  1 »  A,  and  neither  /.  nor  ja  is  sero0    (I.'ote  that  if  either  is 
zero,  G  is  undefined*}    SincMer  inequalities  hold  for  the  ftcsras  of  a 
binomial  distributions  (^»)pAKqliil,  and  asay  be  obtained  by  multiplying 
the  above  inequalities  by  p'^'q^o    They  nay  also  be  generalised  to  the 
multinomial  coefficient: 

1  1 

G    -  •    ( o  i 

■ 

Gi  e*-»  ~ s Gi  «-  (-1 12^> s  tt^sti  *  Gi  <w 

T'here  s  is  the  number  of  comoone.nts.5~  .\.  *  1  and  nana  of  the  \.  vanishes „ 

a  i 

The  "tail"  of  s  binomial  distribution  may  be  -estimated  by  the  f  ollosrin 
formulas! 


k»An 


Akn-k,      l  .,  ,  ,  1 

(k)r  q       £  7:~~"~7  Gt  (JjP-c-od»«i  -  =>  P+£ 


(0) 


it    (g)pkq-",£s  feX  (^)l Voided  x.p    .  (6) 


The  first  of  these  gives  a  closer  estimate  of  the  tail  but  is  somexvhat 
more  complex.    The  inequality  (6)  (Chernoff )  is  often  convenient  because 
of  its  simplicity.   Loser  bounds  for  tails  nay  be  taker,  to  be  merely  loner 
bounds  for  the  first  term  as  in  the  lower  inequalities  of  (2)  or  (4) •> 

We  shall  n&%  prove  the  inequalities  (1)  and  (2).    The  Stirling 
approximation  for  nl  is  as  follows* 

It  is  known  that  if  no  terms  of  the  series  are  taken,  nl  is  underestimated, 
if  only  the  ^  term  is  taken,  then  nl  is  overestimated,  and  so  on.  Ke 
'fish  to  overestimate  ni/(  to)  J(nn)  J  .    This  will  be  done  if  the  numerator 
is  overestimated  and  the  denominator  underestimated.    Thus  re  may  write 

fo-%1/2    n  +  1/2    -n  1 
nl  *  12n 


or 


 tf       1  1  ,  l         i         i       i   t 

(Xn)i(n«)i  "  y^?'  7%^        Cl2^°  12*n~  22pn+ 360(  AnP  +  ^^3) 

We  wish  to  show  that  the  exp  term  is  less  -than  or  equal  to  one,  or,  which 
is  the  same  thing,  that  its  argument  is  less  than  or  equal  to  zero. 
One  or  the  other  of  \9ii  is  the  greater.   From  symmetry,  we  may  assume 
without  loss  of  generality  that  it  is  X,  that  is,  X  >  ji.  Then 

^I77T5  -  ","        A  and  since       is  a  positive  integer,  —  T  <  -i-  c 

360(An)      a360(lm)"»  36o({JIl)3  36q^ 

Further,  jg^—  j^Jf  S  0,  since  Xn  <  n.    Using  these,  we  have 

ire  *  (A  -  s&  -  <jfc  -  rifc>  *  ° 

This  proves  the  upper  bound  (2).    The  lower  bound  is  found  similarly  by 
underestimating  the  numerator  and  overestimating  the  denominator.  No 
terms  of  the  series  are  used  for  nj  and  the  ^Lj  and  ~—  for  the  denomi- 
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nator  term,,    rhis  gives  directly 

The  other  lower  bound  with  -i/tt/2  in  place  of  the  exp  term  is  obtainsd 
by  noting  first  that  unless  both  Xn  and  un  are  le3S  than  or  equal  to  two, 

the  argument  of  the  exponential  (^>ji  ^i^jji^  *s  "Less  than        +35^  *  ^  0 
Now  exp  -  ^  >  -\fn/2s  and  it  is  also  readily  verified  that  for  the  four 
cases  where  both  An  and  on  do  not  exceed  \mo,  namely  (2,2),  (2,1),  (1,2) 
and  (1,1),  that  the  result  is  true»    The  worst  case  is  (1,1)  which  just 
gives  for  equality 0    Hence  the  result  is  true  in  general. 

The  upper  and  lo^er  bounds  (h)  for  the  m.\ltinosial  are  found  in 
exactly  the  same  way  as  for  the  binomial. 

The  tail  inequality  for  the  binomial  is  fori_\d  by  overestimating  the 
tail  using  an  infinite  geometric  series „    This  process  is  familiar 
(see,  for  example,  Feller)  with  g  replaced  by  the  t\nomial  coefficient,. 
The  inequality  (6)  is  a  special  case  of  Chernof  f  »s  i^quality  which  will 
be  discussed  later  more  generally,, 


A  Lower  Bound  on  the  Tall  of  a  Distribution 

Let  n»C«5  be  the  logarithm  of  the  moment-generating  function  of  a 
distribution  F(x),  and  assume  u.(s)  exists  in  an  interval  with  s  -  0  in 
its  interior.  Then 

dF(x)  *  e^(s)  e"sx  dG(x)  (!) 
where  G(x)  is  the  distribution  of  the  tilted  random  variable  obtained  from 
F(x)  by  the  e  A  multiplying  operation  and  normalisation.    G(x)  has  its  msan 
at  n(s)  and  its  variance  IsT     *  ^'(s). 
By  the  Chebycheff  inequality 

G(u'(s)  +^/p"'lsT.  )  -  G(u'(s)  -ok/^TbJ  -  0  )>1  ~  1 
for  any  positive  C\0    Mow  integrate  equation  (1)  from  \i'  (s)  ~  o</p.! ' is )  -  0 
to  u,!(s)  +o</ii'  s  (s)0    This  gives  ✓ 

F(u°  ^/^tj  -  F(u<  ^  /prr  _  0)  *      Je  :*  dG(x) 

#' 

This  then,  is  a  lower  bound  on  the  probability  for  the  F  distribution  in  a 
small  interval  in  terms  of  the  logarithm  of  the  moment  generating  function. 

If  F  is  the  convolution  of  n  identical  and  independent  distributions, 
each  with  n(s)  for  its  log  moment  generating  function,  then  that  for  F  itself 
is  equal  to  nu.(s).    The  interval  in  question  is  then  2*\/n^ 1 '  while  the  center 
position  (for  a  fixed  s)  grows  as  nji*(s). 

If  we  integrate  (1)  from  -00  to  u.»  +f//[TrT  and  assume        we  obtain  an 
underbound  on  the  tail  of  the  distribution  F  in  the  negative  direction  ,>  This 
gives  H»  +«f/vT7 

F(m»  VAPT)^  J  e"**3  dG(x) 

—00  . , 

^  -  8n»  +  s&ftP~r  dG(x) 


7  * 


If  ,?  it  »p.i  convolution  cf  rj  J.dacticfi.1  &istrd£.  \i-io:-?  each  with  ifrC'c->  as  ths. 
Ic£3rixbx  oi'  its  coassnrt  gcr«ort.t3Jts,t  functior..   

Thus  the  a-_"~uaEr«t  of  T  spprosshes  r. sy  :.;t .  :ticn~.  by  for  iarne  n  the  arga~ 
rant  Sty'  app&ariag  in  th-3  Chernof;  upper  bound.    Likewise  the  exponent  on 
tne  richi  (;^nd  the  coefficient  1-3  can  also  be  included  as  a  term  in  the 
•  exponent)  -  -preaches  as^ncpl-tieslly  the  expcr^nt  ir  the  Chernof f  upper 

rx>urc,=  j 

-Iocs-:  fr^uaHties  Slay  also  be  extended  to  tno  cr.se  where  F  is  a 
=onvc!!uticr.  si  r.ot  necessarily  identical  distribute ons  with  functions 

Co)  (i  -  1,  i  t     -    *  ft).    Then  for  F  itself  we  have  u  -  V'jj.^  ,  vi«  -2.^ 
--d  P  '  1 »  and  those  may  be  substituted  in  (2)  and  (3).    It  is  also 

evident  that  these  same  inequalities  for  &"2Q  give  a  lower  bound  on  the 
tail  in  the  positive  direction,  that  is,  1  -  F(|i'      f       -  0). 
lover  Bounce  on  i!ultinonr  .»?.  Tails  and  Tei 


•Suppoee  we  have  a  discrete  distribution:  a  random  variable  can  assume 
values'  ^  <£.v2<~  -  -<(vt  with  probabiiit  -s  p.,,  p?,  -  -  »»  p  .    We  wish 
to  establish  a  lower  bound  on  the  size  of  term  that  can  be  found  in  a  email 
interval  when  this  distribution  is  convolved  with  itself  n  times  (that  is, 
junns  in  the  distribution  of  the  sum  of  n  independent  variables,  each  with 
the  given  distribution)  „    We  first  show  the  existf-n-  oe  of  a  term  having  a 
certain  sise  near  the  mean  cf  the  convolved  distribution.    To  do  this,  the 
following  lemma  is  first  proved., 


■v-  ~~ — 

?arts  cf  these  results  were  obtained  in  collaboration  with  Peter  Slias. 


LegBR.i  For  any  given  n,  we  can  find  integers  i^, xi^t  ,  such 

that 

K  -       ^  1  C«  ) 

2ni  *  n  (2) 
nZPi  v.  ^^n±  v±  ^  n  v±  +   £l  (3) 

where  A  =  ^in  v ,      ..  -  v . „ 
A      i  +  1  i 

Proof j  We  first  find  a  set  of  integers  at.  which  satisfy  all  the  con- 
ditions except        vi  ■<  n       p..  vi  *  A ,  and  will  than  derive  from  these  the 
n^.    Choose       to  be  the  first  integer  greater  than  p^n.    Set  m^  -  p^n  -  6^. 
Kext,  choose  m-.  as  the  greatest  integer  less  th~n  p^n  and  set       -  pnn  -  5n  .. 
If  6^  -  5^  0,  take  another  m  from  the  low  end  (i,  e„,  m^),  the  largest 
integer  less  than  p^n,  and  then  calculate  6t  +       +  6^  where  52  -       -  p^n. 
If  this  is  positive,  proceed  with  p  ,  etc.,  until  the  accumulated  sum  of  6fcs 
first  becomes  non~pos it ive.    When  this  ••'CcursJ.  terms  are  taken  from  the  top 
end  of  the  v  range  (Pt-1>  pt-2>  etc*)  the  accumulated  sum  of  6's 

goes  positive. 

This  process  is  continued,  alternating  from  one  end  to  the  other  as 
the  sum  of  the  6's  changes  sign,  and  eventually  will  end  with  some  index 
k,  having  the  property  that  all  ^  for  i<k  satisfy  n.  -  pin  -  0 
while  for  all      with  i£  k,  we  have       -  p^n  =  6^  0.    At  each  stage  of 
the  operation,  the  total  accumulated  discrepancy  satisf  iesj^o^^  J  £  | .  This 
is  true  at  the  beginning,  and  arguing  inductively  at  each  stage  we  add  a  6 
of  absolute  value  less  than  or  equal  to  one  to  an  accumulated        of  abso- 
lute value  less  than  or  equal  to  one  and  of  opposite  sign.    This  leads  to 
the  next  accumulated  sum  also  being  less  than  or  equal  to  one  in  absolute 


value.    Hence,  when  the  last  assignment  of       is  to  be  made, 

If  we  let       -  n  -  a^,  then  we  satisfy^m,  -  n  and  also  hr 

i  /  k 

\  •  n  -^Z,  (np    +  6  ) 

.  i  /  k      1  1 

-  n  -(n  -  np  )  6 
K       i  /  k  1 

3  '*k  +  9  H^1 
Thus,  |  \|*^\  -lso.  • 

Nor  since  >  5    •  C.  k  have 
1  1 
h    .  n 

-X  6  -  r:  6 

1         1     n  *■  1  1 

where  h  is  the  index  cf  the  largest  nejwtive  &i ,  (eithar  5,.  c*  - 
Multiplying  each  side  by  vh  and  using  the  monotc.  s  ordering  of  t;  e 
obtain 

%l>:      -  .h      .  t  t 

-  ^  6i    -f-5i  -h  ■  £  6i  'b**^  5i  *s 

J-  J.  h*l  h  +  i 

Hence,  using  the  end  expressions  in  the  above  inequalities r 


and  therefore 

t  . 

1  ' 


npi  vi +  >:5i  vi 


t 


New  starting  with  the  m±  we  can  construct  a  set  of       which  satisfy 

all  the  conditions  of  the  lemma.    Note  first  that  all  the  6    for  i^  h 

are  positive  and  for        h  are  negative.    If  we  replace  one  of  the  lower 

'u±>  say  ma(a£h),  by  the  next  larger  integer       +  1  and  simultaneously 

an  n^b/h)  by  the  next  lower  Integer  ^  -  1,  we  retain  the  properties  that 

the  errors  in  approximation  satisfy  J  6±  J  £  1  and  that  their  sun  be  T.ero 

(or  equivalency, 2k  -  *0«    However,  this  reduces  the  value  of!>m  v 

x  * —  i  i 

by  an  amount  vb  -  v&.    Starting  with  the  set  cf  m±  just  derived,  we  shall 
show  how  by  interchanges  of  this  type  it  is  possible  to  go  down  f :•  om  the 
value'21mi  Vl  by  steps  none  of  which  is  lar-er  than  A,  and  eventually  arrive 
at  a  sum  less  than  or  equal  to  n^P-j^  t±.    It  will  follow  that  in  this 
sequence  of  operations  there  is  a  stage  at  which  the  third  condition  of 
the  lenma  obtains. 

The  series  of  steps  is  constructed  as  follows .    Perform  the  inter- 
change operation  on       (the  last  negative  n^)  and       +  r  Since 
Vh+1  ~  vh^  ^»  the  chan6e       the  sum  due  to  this  change  is  less  than  or 
equal  to  A.    Now  in  place  of  this  interchange  consider  that  of  h  against 
h  +  2,  or  that  of  h  -  1  against  h  +  1„    The  additional  change  in  these 
cases  over  that  just  considered  is  clearly  less  than  or  equal  to A,  being 
indeed  vn  +  2  ~  vh  +  1  or  Th  ~  vh  -  r    ^  118x1  stage  would  involve  adding 
to  one  end  or  the  other  of  the  interval  already  taken.    This  again  changes 
the  sum  from  that  previously  obtained  by  not  more  than.d.    This  process 

is  continued  until  the  ends  of  the  range  are  reached,  that  is,  v    and  v 

t  1 

are  used  in  the  interchange <,    These  are  nr/n  left  in  the  changed  state  and 
the  process  is  started  again  with      and       +    .   Working  outward  from 
these  eventually  the  nuabers  m^  and  mt  _     are  used.    These  are  then  left 


in  the  changed  state  (that  is  at  rq^  *  X  and  ffi^  _     -  1)  and  again  the 

.'.rocess  started  at       and  ac.^  +    „    This  procedure  is  continued  until  the 

permanently  changed  m's  from  one  end  or  the  other  reach  e,  or  a,      .  so 

n        n  +  1 

that  .Vzrthsr  steps  of  this  type  are  not  possible.    The  set  of  changed 

m^'s,  si;'  ie J  |  then  existing  have  essentially  the  reverse  property  of  the 

original       set ;  the  corresponding  s/  (that  is       -  p^n)  satisfy  6^  0 

for  Il£h:  foj  a  certain  h'  „    Hence s  using  essentially  the  same  argument 

we  used  in  prov.'ng  (U),  we  can  show  that 
t 

i 

Thus  this  series  of  steps  has  at  souc  stage  given  a  set  of  integers 

such  that  0^"*£J^  6  -Cxd  f  namely,  the  integers  at  the  stage  just  before 

this  sua  goes  negative,    For  these       we  have,  equivalently, 

n  ■^-pi  vi  ^  .  y—  n   v.  ^  n  -2-^         +  A  ° 

'chis  completes  the  proof  of  ti<e  lemma. 

Returning  now  to  the  original  problem,  consider  the  term  in  the  nth 
convolved  distribution  where  thi  value       is  taken       times  (i  -  1,  2,  —  t), 
the  n^  being  those  of  the  lemma.    In  the  multinomial  distribution  this  gives 
rise  to  a  term  of 


/n\  n, 

0  ?  ^ 


n 

This  inequality  is  an  application  of  the  general  inequality  proved  previously 

■ 

for  mult;  nomials-,    We  now  wish  to  simplify  this  making  use  of  the  fact  that 
the  n^  are  close  to  p^nj  j * j  n^  -  p^n  Consider  the  last  terms  in 


the  exponential? 

\  log  P.  -27.  log  5  ,  -S^  log  (1  .  \} 


6 


i 


*  ppi  (since  log  (1  +-x)& 


Pin  (since  «  o) 


-  -i5T  i. 

The  first  exponential  term  can  be  estimated  as  follows. 
^TFnT         12n^pJ  "H7 


We  now  assure  that,  for  each  i,  p  n^l  (in  other  wor^s,  that  n^1  ). 

.  min 

f0U°,'S  ^  e"Ch  "i?  1  K  «  -       «  I  -a  „.  „  „  integer) 

Srtrt   henna  I 


n.  -  6 
p    5L  -  _i  i 

i   n±  n± 


ni 

-  1  +  i_ 
^2 


Thus 


Finally  the  coefficient  in  (5)  can  be  underbounded  as  follows. 
ni  6i  -1/2 


-1/2 


7/  (  o^-     exp  (-?2^i 

Collecting  these  various  terms  we  have  the  following  result j 

Theorem:    The  sviz  of  n  independent  random  variables,  each  w.'.th  the 

sais  discre^e  distrib  r:ion,  probability  p.  of  value  v^^  (i  *  1,  2,  -  -  t) 

O^j^T  v^  p )  has  a  term  in  the  closed  interval  frca^pj  vi  -°  ^  P    v^  +  & 

■  ESac(^   .  -  v  Jand  the  terra  has  a  value  at  ieastr.  ^    ..     e    3n  —  p .  f 
^       A  p^TTp  i 

provided  n  ^?  p~ 

ain 

This  result  cay  be  generalized  to  give  a  .era  of  such  a  dis  -ributicn 

anywl^vY,    in  the  possible  range.    This  is  dene  by  writing  the  dis  :rib  vtion 

in  terras  of  the  tilted  distribution}  the  sua  of  independent  random  variables 

v .  s  — v^s 

Vth  probabilitLib  ^(s)  -  p^  1  /  ^  p^e  As  we  have  seen  pre  iously, 

the  distribution  function  of  the  original  sum,  F  (x) ,  is  related   ,o  that  of 

the  tilted  distribution  function,  r'n'.t),  by  the  equation 

dFn(x)  -  e^(s)  e"Sx  dGn(x) 
n  n  ' 

The  Gn  distribution  has  a  term  in  the  internal  A  -  n'-(s)  to  A  +  £,  since 

jj.'(s)  is  the  mean  and  the  previous  result  applie.,  „    This  gives  a  ,erm  in  the 

Fr  distribution,  to  the  anoxint  ctated  in  the  following. 

Jheor^J    ~he  sun  of  n  independent  random  variables,  each  with  the 

same  discrete  distri  vj'ion,  probability  pt  of  value  vi,  (v^  <  vivi^ 

(i  -  1,  2,  -  -       t),  has  a  term  in  the  closed  interval  from  A  to  A  ♦  ^1 

where  A  »  mx  (v,      -  v. )  and  n  v  .    ^A<.  nv      ,    The  term  will  lave  a  ' 
i       i*l       i  nun  ^>  max 


value  at  least 


v.s  v,s 


where  q^s)  -  p^e  p^e  1    and  s  is  chosen  to  make  A  -O^Cs)  vi> 

and  provided  n^q^       (s).    The  last  term  is  the  Chernoff  Vund  with 

^     v.s  min 
H(s)  -  log<>p  e  1  ,  ^'(s)  -  A0 


A  Coafcinatorial  Theor-en 

Theorem;    Suppose  we  have  a  set  of  objects  S.,,  S^,  oc^S   and  a  nusier  of 
nuriBrically  valued  proportiee  (functions)  for  the  objects  ?ia  Pg,.*?^ 
These  are  aon=negative  P.  (S.)  £  0  end  we  laiosr  the  averages   of  these 
properties  over  the  objects: 

Then  there  assists  an  object  £^  for  vniioh 

P4(S  )  <di.         i  -  lf  2,  d 
More  generally  given  any  set  of  K.  >  0  satisfying 

i«i  i 
then  there  exists  en  object 

Pi(S?)  <  i  -  1,  2,  BOO„  d 

Proof;    The  second  part  implies  the  first  by  taking  %.  -  d0    To  prove 
the  second  part  let  l!±  be  the  cuaber  of  objects  for  which  P^CS)  >  K^a^o 
New  A±  >  i  H±  K±  A±  (sii»e  all  S  »s  have  Pi  values  >  0) . 
a 

Hence  R.  <  — 
a  Ki 

The  total  nucber  of  objects  U  violating  any  of  the  conditions  is  less  than 

or  equal  to  the  sum  of  the  individual  N. 

l 

M  <  n  ^~   f"  -  n        ^sing     ^  i.  <  1 
Hence  there  is  at  least  one  object  not  violating  any  of  the  conditions  <> 


Sgn^s  Results  cr>  Determinants 

The  root  of  a  determinant  equation., 

Leans:    Given  f  .{»)  *  1,2,  «.)  continuous  functions  of  w  in 

the  range  a  <  X  ■<  L  and  in  this  range  £..  ,{ta)  >  0. 

>    fij^fc)  >  °s  '^(a)  <  ^»  f^(b)  >  d,  "Chen  there  exists  W,  a  <  V,  <  b 

and  a  set  of  X.  >  Q,TX.  *  1,  such  that 
i         —  i 

^cof  j    Consider  the  d  dimensional  region  P.  whose  points  are  (JL,  Xd,.  W) 

V7here  X±  >  0>jT  K±  *  1,  a<  ?:  <  bo    This  is  a  topological  imace  of  a  sphere 
and  its  interior  c    For  a  fixed  W  in  the  range  from  a  to  b  .  consider  the 
continuous  rapping 


ij    1  ^ 


v  *  w  + 1      f .(iv)x. 

1        fj  id  .1 

ix  a  <  Va  <  b 
a  if  f ,  <  a 


l^bif  ?1>  b 

Note  that  the  denominator  for  Y^  does  not  vanish  because  of  our  assumption 
that  ^  ^-(6°)  >  0  and  hence  the  Y   are  rell  defined  0    Also  the  Y.  are 

(X^tf)  in  R  continuously  into  points  (Y^V)  in  Ro    Consequently,  by  the 
Erouwer  fixed  point  theorem  there  exists  a  point  (XJRf)  which  is  napped 
into  itself,  that  is,  a  point  for  which  (W)  -  X .  ^         (W) s 

Vi  -  V„    The  value  of  W  for  the  fixpoint  clearly  is  not  a  or  b  since  these 

points  are  moved  upward  or  downward  by  our  assumptions „    Hence  for  the 

fixpoint  we  have  Iff  «  W  + 1  -  T"  f    (W)X,  or  T"  f .  .(W)X.  «  1„    It  follows 

ij  iH 

that  for  the  fixpoint 


Let  the  elements  a..  .  of  a  ratrix  be  non-negative  e    Suppose  there  is 
an  eigen  vector  A   all  of  whose  components  ere  positive,  a.  >  0,  &v6  the 

1  *     2,  ' 

corresponding  characteristic  value  is  K  .    fie  trill  show  that  for  anv 

c  * 

other  characteristic  value  ^  we  have  |A_J  £  \ .    Let  B.  be  a  characteristic 
vector  for  ^  where  r;e  adjust  the  length  of  this  vector  as  follows., 
Choose  its  length  in  such  a  way  that  A.  -  jB  j  S  0  for  all  i  and  the 
equality  holds  for  at  least  one  i,  say  i  «  h,  so  that  At  -  JB  j 0    It  is 
clear  that  this  can  be  done  since  with  zero  length  all  components  of  B 
are  less  than  those  of  A  and-' increasing  continuously,  eventually  a  first 
one  of  the  jB..  j  reaches  its  corresponding  A^.    Me  now  have 

S>i£ij  *  V5  (1) 

^  Biai.  -  V3;  (2) 

^lBil£i^  \\\  (3) 
Subtracting  these  equations  for  j  *  h 

f  {V  iBil>aih^  \A,~  N  jBh|  (h) 


All  terms  in  the  sua  at  the  left  are  non-negative  and  also  A^  is  definitely' 
positive o    It  follows  that  A  -  jJ^j  >  00 


The  derivative  of  the  eigenvalue  of  a  matrix., 

Suppose  we  have  the  square  matrix  (a^s))  where  the  elements  are 
different iable  functions  of  a  parameter  s.    Let  V  »  V(s)  be  an  eigenvalue 
with  corresponding  eigen  vector  A^  -  Ai(s)  and  eigen  vector  B^  -  B.(s) 
for  the  transposed  matrix.    Thus  ^ 

iaij(s)-^s^ij]  -°  a) 

^Vij"^  (2) 

^Vij"VBi  (3) 


Theorem: 


".Bj 


V(s)  -  ^- 


To  prove  this,  differentiate  (2 )  with  respect  to  s: 

s4aij+^Aialj  ■  ^Vvaj  • 

Nor/  multiply  by       and  sum  on  j 

S"  A  V  ,B   ♦  S~  A. a  '  E    -  V'  J"  A  .B  .  *  V  y  J.B  .  „ 
ij        J  J     i3  ^   J  J       T   0  J 

Using  (3)  in  the  first  term  cancels  the  last  term  on  the  right,  giving 
the  desired  result 


1  -7 


Upper  and  Lower  Bounds  for  Powers  of  a  Matrix  rr: tn  iMon-nsgative  Lleuanit: 

th 

We  frequently  have  to  deal  with  the  r.     poorer  of  ^  matrix  -.Those  ele- 

ments  are  fL  JJf  0o    We  denote  the  ij  element  of  this  n'J '  power  by      ;  '  * 
lj  -j 

We  are  concerned  here  frith  the  case  where  the  corresponding  graph  has  tbfc 

property  that  it  is  possible  to  go  from  any  node  i  to  any  other     by  t 

finite  sequence  |3.  S  .  r-here  all  the  ?  s  Ifc  this  series  arc 

positive-    This  means  that  the  crap:,  consist?  of  one  ergodic  or  periodic 

set  in  the  usual  ISarkoff  analysis.    The  non-negative  conditio-  en  the  f> 

insures  the  existence  of  a  real  eigenvalue  v   Khicfc  is  a  solution  cf  the 

c 

determinant  equation  \\.  .  -  vo.   ;  *  0.    '.  urbher*  this  v    dominates  in 
absolute  value  any  other  eigenvalue  v  ,  that  is.,  y  jvjo 

Corresponding  to  root  v    there  will  exist  right  and  left  eigenv  — 
for  the  matrix 


3^  -  vo  A. 

l 


Bo  -  vo  Ei  & 

The  conditions  34  „>"0  imply  that  all  the  A.  be  the  same  sign  (or  vanish) 
and  all  the  Bi  be  the  same  sign  (or  vanish) o    In  both  cases  vre  take  the- 
to  be  positive  (multiply  by  -  1  if  necessary) .     In  the  case  satisfying  the 
graphical  condition  it  is  easily  seen  that  all  Ai  and  all       are  then  actually 
positive  (none  vanish) . 

Theorem;    Under  the  conditions  above,  i.  e.  0  and  any  state  acces- 

sible from  any  other  through  a  finite  sequence  of  non-vanishing  transitions, 

fn 


the  element  of  Jl  ^  /j 


where      t    is  the  smallest  'nou~  vaniski  i:g )  f^..,  and  d  is  an  integer  such  that 
there  is  a  oath  fron:  air-  stats  i  to  any  state  j  rrith  not  oc-re  than  d  steps 
(d  -  I  irtirmsciace  states) »    Furtnsrcore ,  there  will  exist  and  n 

such  that 

i  rj     //  0  '  O 

provided  cither  (1)  lor  some  n^r  ~\  _.'  ^  0  for  all  1,  j  or  (2)  the  state 
aiagrss  iu-.i  no  recurrent  subsets  (the  greatest  eoJSfflon  divisor  of  closed 
path  lengths  is  1}» 

Proof ;    The  first  inequality  is  proved  easily  by  induction  on  n, 
For  n  *■  C, 


since  for  i  ;K  3»  the  right  menfoer  is  positive  and  6.  .  -  0,  trhile  for  i 

1  .j 


Now  supposing  the  inequality  to  hold  for-  n  we  prove  it  for  n  +  le 

(n) 


5     -  1  and  the  right  reenter  is  one. 
rupposing  the  inequalil 
,  (»*>  .To       3  <J 


<^  p.    B*1    B    v  n 
^  J  so 


-3. 


»  B~J"  v  n    v  3. 
J       c  CI 


"his  is  the  corresponding  inequality  for  n  •>  i3  concluding  the  proof. 

The  second  inequality,  that  3,/  E.  <  (vVB  .  )d  is  shown  as  follows. 
From  (1)  ,  let  some  '.  ,  be  positive  then 


lp 

The  Nunfoer  of  Sequences  of  a  Given  length 

Suppose  a  nunber  of  letters  are  available  whose  lengths  (or  durations) 
are  a^,  a2,  .„„,  ag  and  we  wish  a  bound  on  the  marker  !!(£)  sequences  of 
total  length  /.    Here  it  is  assumed  that  any  sequence  of  letters  is 
allowed,    }](£)  satisfies  the  difference  equation 

Ul£)  -  N(/-ai)  +K(/-»a2)  +  ...  +  h(/-  a  )  0 

as  T7e  see  by  noting  that  each  sequence  of  length  £  mist  end  in  one  or 
another  of  the  available  letters «    Furthermore,  the  boundary  conditions 
say  be  taken  to  be  K(/)  «  0  for  !  <  0  and  K(0)  -  1.    Associated  with  the 
difference  equation  is  the  folic  uing  characteristic  equation: 

Since  all  the  a±  are  positive  cud  real,  the  right-hand  member  is  a  strictly 
monotone  decreasing  function  of  X  and  varies  from  co  to  0  when  X  goes 
from  0  to  co  „    Consequently,  tte  characteristic  equation  has  a  unique 
positive  real  root  W0 

Theorems   n(g)  <  ^  9 

To  prove  this,  note  first  that       satisfies  the  difference  equation 
since  this  results  on  multiplying  the  characteristic  equation  (vrith  X 
replaced  by  W)  by       „   With  regard  to  the  boundary  conditions, 
W°  «  1  «  n(0)  and  F^C-  K(f)  when  /<  0C    Let  a  be  the  scaliest  of  e^, 
a2,  „..,  ag„    Then  it  is  possible  to  proceed  by  a  kind  of  induction  of" 
steps  of  i(each  of  length  a)  to  show  that  the  dominance  of  Xi£  over  N(£) 
continues  for  all  £.    In  fact,  suppose  that  for  jg*£  we  have  K(/)  <  „ 
Then  f or  £  in  the  range         X  <  £^  +a 

N(/)  -  N(/-  ax)  a2)  ♦  ...  +  N(/-a  ) 

Sinse  the  inequality  is  true  for/s  0,  it  follows  that  is  is  true  for 
all/.  ' 

A  more  general  problem  cf  the  same  sort  relates  to  sequences  which 
are  subject  to  a  finite  state  set  of  constraints.    Thus,  suppose  there  are 
d  states  and  that  in  state  i,  letters  of  lengths  /      are  permitted, 


2p 


leading  to  state  j0    The  index  a  ranges  over  the  different  letters  going 
from  state  i  to  state  j  and  j  ranges  over  the  different  states  v:hich  can 
folic*?  state  i0    Now  let  ?!..(/)  be  the  number  of  sequences  which  are* 
possible  and  which  start  in  state  i,  end  in  state  j  and  are  of  length  /0 
These  quantities  are  readily  seen  to  satisfy  the  difference  equations 

m  0  £  <  0 

The  corresponding  characteristic  equations  are 

A.  -  A    W ai3 

oTi 

Let  V/  be  the  largest  real  root  (there  is  a  positive  real  root  by  a 
previous  result  based  on  the  fix  point  theorem)  of  the  determinant 
equation: 

I Y  iT^id  r  6,, 

and  1st  A4  be  a  corresponding  (positive)  solution  of  (2),    We  will  assume 
the  graph  of  the  constraints  is  fully  connected  so  it  is  possible  to  go 
from  any  state  to  any  other.    Then  all  the  A^  are  positive  (none  vanish) „ 

We  will  now  show  that  the  number  of  sequences  of  length  £  starting 
in  state  i  and  ending  in  j,  N^C^),  is  bounded  by 

V>« 

This  is  certainly  true  for  £<  0  and  also  far  /  -  0  since  then  both  sides 
are  one  if  i  "  jt  and  otherwise  the  left  side  is  zero  with  the  right 
positive o    We  now  proceed  by  the  inductive  type  process  as  before,  assuming 
the  inequality  out  to  some  £^  and  then  show  it  follows  for  /  out  to 
plus  the  minimum  /  . . e 

■yO  -  £  «u  <,j> 


1  US 


(continued  next  page) 


3p 

Thus  the  inductive  step  carries  the  inequality  up  to  £-mJ  +  ain  /  . 
and  hence  it  is  true  far  all  £0  ^ 

An  Alternative  proof  that  \]<J)  <  li^ 

Consider  the  case  of  a  sequence  of  letters  of  different  lengths 
al*  a2*  po°  ag  no  constraints  c    We  wish  to  prove  that  EJ(/)  <  9 

where  W  satisfies  ^  W^A  «  1.    Assume,  in  contradict  ion,  that  for  soas 
£,  N(i)  >  W    .    Then,  since  M(0)  <  ¥°,  there  is  a  greatest  lower  bound  of 
/5s,say  £%  for  which  the  theorem  fails .    In  the  interval  £*<  £  <  £*+  J  a 
there  must  be  an  /,  say        for  which  the  theorem  fails  (a^     is  the  smallest 
a±)c    Sybdivi.de  the  sequences  of  length  £^  into  subsets  according  to  the 
first  letter .    let  the  fractional  number  in  the  subset  beginning  with  the 
letter  i  be  f.  (i  -  1,  2,  g)  .    Choose  the  subset  for  which  aT1  f?1 

is  a  minimum,,    In  a  sense,  this  means  the  subset  which  conveys  ohe  least 
information,  log  f°  ,  per  unit  time  in  its  first  letter ,    The  minimum 
value  of  a~   log  f  J1  aaong  the  different  subsets  is  less  than  or  equal  to 
log  WB    To  see  this,  suppose,  in  contradiction,  that  for  all  i,  a*:1  log  fT1  >  log  W 
Then  f .  <  TTH  and,  summing  on  i,  1  -      f .  <  £  -  1,  a  contradiction. 

Hence  the  subset  chosen  will  have  a^1  log  f"1^  log  I,  or  f .  ■>  nT*i.  If 
we  delete  tho  first  letter  from  all  sequences  in  this  subset,  we  are  left 
with  a  set  of  more  than  ifi-  ~a*  sequences  of  length  £^  -a^  Thus 
N(  A         >  *  1  "ai°    S±t»e/1  -ai  <  /*  this  contradicts  the  assumption 
that  /  was  the  greatest  lower  bound  of /'s  for  which  the  theorem  fails . 
Hence  the  theorem  is  true  for  all  /„ 
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Characteristic  for  a  Language  rcith  Inclopsndsnt  Lot  tors 

Suppose  vfe  have  a  stochastic  process  ^nerating  a  language  ccn^ 
sisting  of  a  sequence  of  independent  letters 0    These  letters  are  all  chosen 
with  the  probabilities  p±  for  letter  i,  i  =  1.  2,  g0   We  consider 

sequences  of  n  such  letters,  that  is.  words  of  length  n  in  the  language. 
Suppose  that  all  such  words  are  arranged  in  order'  of  decreasing  probability 
from  the  most  probable  one,  consisting  of  a  sequence  of  a  most  probable 
letters,  down  to  the  sequence  of  n  least  probable  letters,,    The  logarithm 
of  the  probability  of  any  particular  nerd  is  (because  of  the  independence 
of  letters)  the  sun  of  the  logarithms  of  the  probabilities  of  the  indi- 
vidual letters  o    Thus,  the  logarithm  of  the  probability  of  a  cord  is 
a  random  variable  which  is  the  sum  of  n  independent  random  variables  each 
with  the  same  distribution  function.    We  may,  therefore,  apply  previous 
results  concerning  the  tails  of  such  a  distribution  to  estimate  the 
probability  in  our  monotone  sequence  of  all  words  beyond  a  certain  point « 

The  distribution  of  log  p"1  for  a  single  letter  will  bjivc  s.  moinent 
generating  function 


1 

1-  s 


i 


Hence 


y.(s)  »  log  S 
1  x 

KT~  1  -  s  ,  =1 

ii\s)  -  1     jr-g   (i) 

Our  upper  bound  on  the  tail  of  a  distribution  then  shows  that  the  total 
probability  PT  of  all  sequences  whose  individual  probability  P  satisfies 

JTp^iogp-1 

|  log  P  <p.»(s)  -    (2) 

i  1 


Pegs  £c 


is  bounded  by 


H  l0S  PT  *         -s^s)  -  log  £  p* "  s  *    1  ' 

*  y  P. 


V  1  -s  ,  s 

2_  p<     log  p. 


1   (3) 


This  last  expression  as  well  as  (1),  can  be  written  more  compactly  in  terms 
of  a  new  set  of  probabilities  ^(s) 


Ihe  relations  (2)  and  (3)  no?  become,  after  some  manipulation, 

10£  P  S  I^j  log  p"1      £  T  q^s)  log  3y  (),) 

2.  J        i  i 

This  is  one  of  the  results  we  desire,  an  overbound  on  the  tail  of  the 
distribution  of  probability  for  .sequences . 

We  nor?  desire  a  similar  bound  on  the  number  of  sequences  whose 
probability  is  greater  than  P.    To  this  end,  consider  constructing  all 
sequences  of  length  n  giving  each  letter  probability  i  (instead  of  the 
probabilities  p±  they  actually  have).    We  again  consider  the  distribution 
of  the  sum  of  the  logarithms  of  the  probabilities  (using  the  original 
Pi  ^lues)  for  the  letters  in  a  word.    Note  that  the  sequences  arranged 
in  monotone  order  are  in  the  same  order  as  previously.    Under  these  new 
conditions  the  moment  generating  function  V-^s)  and  its  logarithm  ^(s) 
are  given  by 

« 

M^s)  »  log  2~P?S  -  log  g 
i  * 


The  total  probability  P2  of  all  sequences  in  the  tail  of  the  distribution 
beyond  the  sequences  whose  individual  probability  P  satisfies 


will  be  bounded  by 

x  t  ^  pi   lo£  pi 

n  loe  p2  5  ^(s)-  b^Cs)  .  log  ^Tp"3  +    1      os   log  g  . 

Tve  note  first  that  in  this  modified  probability  system  (each  letter  with 
probability  ^)  all  sequences  have  probability  ~-  and  c onsequently  the 
number  of  sequences  N2  in  the  tail  whose  total  probability  is  Pg  is 
precisely  P2gn„    Hence  the  number  Kg  in  the  tail  is  bounded  by 

^  log  N2  -  ~  log  P2gn  -  i  log  P2  *  log  g 

c-   ~s  s 
<  2.  Pi    loE  Pi 

~  log  ]>"  p°S  + 


1  rPr 


In  order  to  compare  this  result  with  the  preceding  one  (1;).  we  must 
identify  the  points  at  which  the  tails  of  the  distributions  are  cut  off „ 
This  can  be  done  by  equating  the  probabilities  P  of  the  individual 
sequences  at  the  cutoff  point.    Thus,  using  (1)  and  (5)  and  writing  ^ 
in  the  latter  in  place  of  6  we  have 


i  i 


This  is  obviously  satisfied  by  l~s  -  -e-,  and  since  n"(s)  >  0  the  left 
term  is  a  strictly  monotone  function  of  s  and  therefore  this  solution  is 
unique o 

The  number  of  sequences       now  becomes,  in  terms  of  the  s  involved 
in  (1)  and  (U), 


ZP^8logPS- 

i  log  n2  <  io£  Xp^s  ♦  -Vr= 

1  4-  ?! 


Rige  i-c 


Again  using  the  0^(5)  to  simplify 

|  log  K2  <  T  qi(s)  log  q^r1    «.  (6) 

Both  the  bounds  (U)  and  (6)  are  also  the  limiting  values  approached 
by  -  log  PT  and  -  log  Ng  as  n->ooc    This  follows  from  remarks  concerning 
the  tails  of  distributions  made  in  an  earlier  section „    Thus  the  relia- 
bility curve  of  a  source  of  the  type  we  are  discussing  here  with  inde~ 
pendent  letters  may  be  written  in  parametric  form  as  follows : 


(7) 


e(«3       q^s)  losqrly  -      (a/-  s^1) 
R(s)  *     qi(s)  ioe  qi(s)"1  =  ^y-  ($-0/^  (6) 


I-s 


^here  q.(s)  *  -^r^-  .  (9) 
i 

The  parameter  s  in  these  equations  is  related  to  the  slope  of  the 
reliability  curve,,    In  fact,  we  note  that 

dR     ds  '  ds       */>,/,      x  "/  v  s     1  -  s 

11  (SM  (l-s)ji  (s)  -Ji  (s) 

Thus,  as  s  increases  from  0  to  1,  the  slope  increases  monotonically 
from  0  to  oo.    It  is  interesting  that  at  s  •  1  the  formulas  (7),  (8) 


become 

E 


(1)  "  \  Z  log  Pi  f  log  d 
R(l)  -  log  d 


The  Probability  of  Error 


A  problem  of  importance  in  ii&crmatioa  theory  is  that  or  studying 
the  behavior  of  signaling  codes  that  say  be  used  in  encoding  an  infor- 
mation source  for       noisy  channel  and,  in  particular,  the  probability  of 
error  for  the  optimal  codec    This  paper  is  concerned  r:ith  estimating 
this  probability  of  error  under  fairly  gonarai  conditions „ 

V.;e  niil  find  that,  to  a  large  erfceat,  the  prdblea  can  be  divided  into 
two  parts.    First,  there  is  a  problem  relating  to  the  information  source 
only  (not  involving  the  channel)  which  involves  estimating  the  probability 
of  error  when  the  source  is  encoded  into  a  simple  standard  noiseless  channel 
The  study  of  this  question  leads  to  a  certain  function  which  we  call  the 
reliability  characteristic  for  the  source  and  which  determines,  in  a 
certain  asymptotic  sense  when  the  code  blocks  arc.  Ions,  how  rapidly  the 
probability  of  error  approaches  zero.    Second.,  there  is  a    problem  relating 
to  the  channel  only,,    This  leads  to  a  function  describing,  in  a  sense, 
the  coding  behavior  of  the  channel  with  regard  to  probability  of  error 
when  the  code  blocks  are  long,,    Our  final  and  most  basic  results  show 
how  the  two  functions  may  be  combined  to  give  optimal  behavior  (or  bounds 
on  optimal  behavior)  when  the  source  is  encoded  into  the  channel „ 

We  will  first  clarify  our  terminology,  since  various  writers  have  used 
sons  of  the  terms  involved  with  quite  different  meanings  „    For  the  most 
part,  we  will  restrict  ourselves  to  a  finite,  discrete,  memoryless  channel,, 
Sucl/a  channel  is  specified  by  a  transition  probability  matrix    |jp±(J)||  « 
Here  pi(j)  is  the  probability  that  if  input  synbol  i  is  used, the  output 
will  be  j  and  we  have 


Matrices  satisfying  the  conditions  that  all  elements  are  nonnegative  and 
the  row  suns  are  unity  occur  often  in  probability  and  are  called  stochastic 
matrices  0 

The  input  symbols  to  the  channel  will  be  called  the  input  letters , 
the  set  of  these  the  input  alphabet.,  The  output  symbols  of  the  channel 
will  be  called  the  output  letters  and  the  set  of  these  the  output  alphabet. 


A  channel -is  often  conveniently  represented  by  5  line  diagram  of  the  type 
shxai  in  Fig,  lc 

The  ciianncl  beir.£  memoryloss  eeans  that  successive  operations  are 
independent-    If  the  ir.put  letters  i  end  j  are  used,  the  prdbabiiitv  of 
output  letters  k  ar.d  C,  rill  be  p^kjp..  (/J>»    *  sequence  of  input  letters 
will  be  called  an  input  word,  a  sequence  of  output  letters  an  output 
word,    A  collection  of  M  input  words  all  of  length  n  Will  be  called  a 
block  code  of  length  n,    R  »  3/n  log  U  will  be  called  the  input  rate  for 
this  codec    Unless  otheri-ise  specified,  a  code  v;ill  mean  such  a  block 
code  c. 

A  detection  system  for  a  cede  is  a  method  of  interpreting  output 
words  as  input  words,  that  is,  an  association  or  mapping  of  one  of  the 
input  words  of  the  code  for  every  output  word  of  length  n.    The  pro- 
bability of  error  for  a  particular  input  word  is  the  probability,  if  this 
input  is  used,  that  it  will  be  interpreted  incorrectly c    It  is,  therefore, 
the  probability  of  that  input  word  being  received  as  an  output  word  which 
is  not  detected  as  the  input  word.    The  probability  of  error  for  a  code 
is  the  average  probability  of  error  for  all  input  words  in  the  codec 
An  optimal  code  cf  length  n  is  one  which  minimizes  this  probability 

of  error  (when  using  its  best  detection  system).  These  input  words  iu,  u>,  . 
uM  need  not  all  be  different. 

Cur  main  problem  is  to  estimate  for  a  general  channel  upper  and  lower 
bounds  on  the  probability  of  error-  for  an  optimal  code  as  a  function  of 
the  length  of  the  code  n  and  the  rate  of  transmission  R„    The  ideal  solution 
would,  be  to  find  a  simple  explicit  formula  for  the  probability  of  error 
in  an  arbitrary  channel  as  a  function  of  the  rate  of  transmission  R  and 
the  length  of  the  code  words  n.    This  is  probably  too  much  to  hope  for 
in  view  of  the  diophantine  complexities  of  optimal  codes.    Barring  such 
a  complete  solution,  one  may  still  hope  for  upper  and  lower  bounds  on 
r\  and  perhaps  results  relating  to  its  asymptotic  behavior  when  n  is 
large,    idost  of  the  present  paper  is  devoted  to  this  type  of  result. 

In  studying  the  asymptotic  behavior,  it  will  appear  that  Pe,  for  a 
fixed  rate  R  and  a  given  channel,  varies  approximately  exponentially  with 
n.   For  this  reason  it  is  convenient  to  introduce  a  new  term.    If  a  device 
cr  a  system  has  a  probability  P   of  making  an  error,  we  shall  call  -log  F 
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the  reliability  of  the  device  or  system.   V;e  have  .lust  said  in  effect 
that  for  large  n  the  reliability  for  optfcal  codes  varies  essentially 
linearly  r/ith  n,  that  is.  as  E(R)  .  n9  whore  R  is  the  rate  for  the  coda 0 
More  precisely,  we  define  E  (R )  as  follows : 

E(R)  »  Lin  sup--  log  P 

n  e  opt 

n-s-co 

We  will  call  E(R)  the  reliability  characteristic  of  the  channel  and 
attempt  to  evaluate  it,  or  where  we  cannot  do  this,  at  least  place  upper 
and  lover  bounds  on  it„ 

The  writer  feels  that  the  quantity  we  have  defined  as  reliability 
wiU,  in  many  cases,  turn  out  to  be  the  most  appropriate  way  of  measuring 
s  probability  of  error,    In.  previous  work  by  von  Iteumann  en  unreliable 
neuron-type  elements  and  by  E.  F,  Moore  and  the  writer  on  unreliable 
relays,  the  quantity  •  3  eg  P   entered  significantly  and  was  the  mere 
natural  way  to  describe  some  of  the  results  c    .In  both  these  cases  the 
reliability  varied  rather  s imply  with  the  redundancy  of  the  error=correc  - 
ting  systems o    It  is  a  little  like  measuring  gain  on  a  db  scale  or  ion 
concentration  on  a  pH  scale „   While  actually  little  more  than  a  change 
in  scale,  the  use  of  these  units  of  reliability  in  the  codii^case 
threes  the  results  into  a  much  more  natural  and  illuminating  perspective  „ 

If  we  have  two  given  channels,  it  is  possible  to  form  a  single 
channel  from  them  in  tero  natural  ways  which  we  call  the  sum  and  product 
of  the  two  channels.    The  sum  of  too  channels  is  the  channel  formed  by 
using  inputs  from  either  of  the  two  given  channels  with  the  same  transi- 
tion probabilities  to  the  set  of  output  letters  consisting  of  the  logical 
sum  of  the  two  output  alphabets.    Thus  the  sum  channel  is  defined  by  a 
transition  matrix  formed  by  placing  the  matrix  of  one  channel  below  and 
to  the  right  of  that  for  the  other  channel  and  filling  the  remaining  two 


rectangles  with  zeros-  If 


Pi(5)||   and  IJbJO^ 


are  the  individual 


matrices,  the  sum  lias  the  following  matrix  j 

P2(l)    •     •    •     P1(r)          0    .  •  .  0 

Pt(D    .      .     .     pt(r)          0    •  •  .  0 

0         •       '      •       Q             p^l)  .  .    .  pj(r') 


Pt*(D  •   •  •  Pt»(r  ) 


lags  k 

The  product  of  two  channels  is  the  channel  whose  input  alphabet 
consists  of  all  ordered  pairs  (i.i')  where  i  is  a  letter  from  the  first 
channel  alphabet  and  i    froa  the  acconel,  whose  output  alphabet  is  the 
similar  set  of  ordered  pairs  of  letters  from  the  tsrc  individual  output 
alphabets  and  whose  transition  probability  from  (i,i')  to  is 


Fig.  2 
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Zero  Error  Codes  and  the  Zero  Error  Capacity  C 

In  a  discrete  channel  we  will  say  that  tr/o  input  letters  are  adjacent 
if  there  is  an  output  letter  which  can  bs  caused  by  either  of  these  two. 
Thus,  i  and  j  are  adjacent  if  there  exists  a  t  such  that  both  p±(t)  and 
Pj(t)  do  not  vanish o    In  Figc  1,  a  and  c  are  adjacent,  while  a  and  d  are  not. 

If  all  input  letters  are  adjacent  to  each  other,  any  code  with  more 
than  one  word  has  a  probability  of  error  greater  than  zero.    In  fact,  the 
probability  of  error  satisfies 

p       ~  "j;  n 
o  -     m  pmiri 

where  p^  is  the  smallest  among  the  p±(j).,  n  is  the  length  of  the  code 
and  U  is  the  number  of  words  in  the  code.    To  prove  this,  note  that  any 
two  words  have  a  possible  output  word  in  common,  namely  the  word  consisting 
of  the  sequence  of  common  output  letters  when  the  two  input  words  are 
compared  letter  by  letter „    Each  of  the  two  input  words  haB  a  probability 
at  least  p^  of  producing  this  common  output  word0    In  using  the  code, 
the  two  particular  input  words  will  each  occur  j-j  of  the  time  and  will 
cause  the  common  output  |  p^  of  the  time .    This  output  can  be  decoded 
in  only  one  way.    Hence  at  least  one  of  these  situations  leads  to  an  error .  • 
This  error,  ~  is  assigned  to  this  code  word,  and  from  the  remaining 

K  -1  code  words  another  pair  is  chosen.,   A  source  of  error  to  the  amount 
I  pmin  18  *8»ig*»d  in  similar  fashion  to  one  of  these,  and  this  is  a 
disjoint  event 0   Continuing  in  this  manner,  we  obtain  a  total  cf  pn 
probability  of  error.  *     m  " 

It  follows  that  for  any  rate  R  greater  than  zero,    (ice„    U  >  2) 
4  log  Pe<logp^n+  |  log  2 

'       E  ~  loS  Pmln 

If  it  is  not  true  that  tho  input  letters  are  all  adjacent  to  each 

other,  it  is  possible  to  transmit  at  a  positive  rate  with  zero  probability 

of  error.    The  least  upper  bound  of  all  rates  which  can  be  achieved  with 

zero  probability  of  error  will  be  called  the  zero  error  capacity  of  the 

channel  and  denoted  byCo.    If  we  let  Mo(n)  be  the  largest  number  of 

words  in  a  vcode  of  length  n,  no  two  of  which  are  adjacent,  then  C  is 

1  o 
the  least  upper  bound  of  the  numbers  -  log  MQ(n)  when  n  varies  through 

all  positive  integers .    An  interesting  problem  which  has  not  been  completely 
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solved  is  that  of  evaluating  C    for  an  arbitrary  channel 0 

One  night  expect  that  Cq  would  be  equal  to  log  M0(l),  that  is,  that 
if  we  choose  the  largest  possible  set  of  non  adjacent  letters  and  form 
all  sequences  of  these  of  length  n.  then  this  would  be  the  best  error 
free  code  of  length  n.    This  is  not,  in  general,  true,  although  it  holds 
in  many  cases,  particularly  when  the  number  of  input  letters  is  small. 
The  first  failure  occurs  with  five  input  letters  with  the  channel  in  Fig0  2„ 
In  this  channel,  it  is  possible  to  choose  at  most  two  independent  letters, 
for  example  0  and  2„    Using  sequences  of  these,  00,  02,  20,  and  22  we 
obtain  four  words  in  a  code  of  length  twoe    However,  it  is  possible 
to  construct  a  code  of  length  two  with  five  members  no  two  of  which  ere 
adjacent  as  follows:    00,  12,  2h.  31,  U3«    It  is  readily  verified  that 
no  two  of  these  are  adjacent „    Thus,  Cq  for  this  channel  is  at  least  ~  log  $0 

No  method  has  been  found  for  determining  Cq  for  the  general  discrete 
channel,  and  this  we  propose  as  an  important  unsolved  problem  in  coding 
theory.    We  shall  develop  a  number  of  results  which  enable  one  to  determine 
Cq  in  many  special  cases,  for  example,  in  all  channels  with  five  or  less 
inputs  with  the  single  exception  of  the  channel  of  Fig„  2  (or  channels 
equivalent  in  adjacency  structure  to  it)„    We  will  also  develop  some 
general  inequalities  enabling  one  to  estimate  CQ  quite  closely  in  most 
cases  a 

It  may  be  seen,  in  the  first  place,  that  the  value  of  CQ  depends 

only  on  which  input  letters  are  adjacent  to  each  other „    Let  us  define 

an  adjacency  matrix  for  a  channel,  A,  ,  as  follows, 

ij 


Ai3 


1  if  input  letter  i  is  adjacent  to  j  or  if  i  = j 
0  otherwise 


Suppose  two  channels  have  the  same  adjacency  matrix  (possibly  after 
renumbering  the  input  letters  of  one  of  them„)  Then  it  is  obvious  that 
a  zero  error  code  for  one  will  be  a  zero  error  code  for  the  other  and, 
hence,  that  the  zero  error  capacity  Cq  for  one  will  also  apply  to  the  other „ 

The  adjacency  structure  contained  in  the  adjacency  matrix  can  also 
be  represented  as  a  linear  graph.    Construct  a  graph  with  as  many  vertices 
as  there  are  input  symbols,  and  connect  two  distinct  vertices  with  a  line 
or  branch  of  the  graph  if  the  corresponding  input  letters  are  adjacent. 
Some  examples  are  shown  in  Fig0  3,  corresponding  to  the  channels  of  Fig,  1  and  2, 
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The  are  a:    The  zero  error  capacity  Cq  of  a  discrete  memoryless  channel  is 


bounded  by  the  inequalities 


-log 


AiJ  Vjscofi|Jtj)c 


where  C  is  the  capacity  of  any  channel  with  transition  probabilities  p^(j) 
and  having  the  adjacency  matrix  A .  .  o    The  upper  bound  is  fairly  obvious  . 
The  aero  error  capacity  is  certainly  less  than  or  equal  to  the  ordinary 
capacity  f cr  any  channel  since  the  forcer  requires  codes  vrith  zero  pro~ 
bability  of  error  vhiSe  the  latter  requires  codas  approaching  zero  pro* 
bability  of  error.    By  minimizing  the  capacity  through  variation  of  the 
P^j)  we  find  the  lowest  upper  bound  available  through  this  argucsnt. 
Since  the  capacity  is  a  continuous  function  of  the  p^(j)  in  the  closed 
region  defined  by  p±(j)  <  1,  ^  p.,(j)  -  I,  we  may  write  min  instead  of 
greatest  lower  bound 0 

It  is  worth  noting  that  it  is  only  necessary  to  consider  a  particular 
channel  in  performing  this  minimization,  although  there  are  an  infinite 
number  with  the  same  adjacency  matrix.    This  one  particular  channel  is 
obtained  as  follows  from  the  adjacency  matrix,    If  A±k  «  1  for  a  pair  ik, 
define  an  output  letter  j  with  p^j)  and  pk(j)  both  differing  from  zero. 
Now  if  there  are  any  three  input  letters,  say  i  k  1,  all  adjacent  to  each 
other,  define  an  output  letter,  say  m,  with  pi(m)  pk(m)  p1(m)  all  different 
from  zeroo    In  the  graph  this  corresponds  to  a  complete  sub  graph  with 
three  vertices „    Next  subsets  of  four  lettors  or  complete  subgraphs  of 
four  vertices,  say  i  k  1  m,  are  given  an  output  letter,  each  being  con- 


nected to  it,  and  so  on.    It  is  ev; 


that  any  channel  with  the  same 


adjacency  matrix  differs  from  that  just  described  only  by  variation  in 
the  number  of  output  symbols  for  some  of  the  pairs,  triplets,  etc.,  of 
adjacent  input  letters.    If  a  channel  has  more  than  one  output  symbol  for 
an  adjacent  subset  of  input  letters,  then  its  capacity  is  reduced  by 
identifying  these.    If  a  channel  contains  no  element,  say  for  a  triplet 
i  k  1  of  adjacent  input  letters,  this  will  occur  as  a  special  case  of  cur 
canonical  channel  which  has  output  letter  m  for  this  triplet  when  pi(m), 
Pk(m)  and  p1(m)  all  vanish. 

The  lower  bound  of  the  theorem  will  now  be  proved.   Vse  use  the 
procedure  of  random  codes  based  on  probabilities  for  the  letters  P^, 
these  being  chosen  to  minimize  the  quadratic  farm         A^F^p^  Construct 
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an  ensemble  of  cedes  each  containing  M  words,  each  v;ord  n  letters  long. 
The  words  in  a  code  are  chosen  by  the  following  probability  method.  Each 
letter  of  each  word  is  chosen  independently  of  all  others  and  lias  the  value 
i  with  probability  P^o    We  now  compute  the  probability  in  the  ensemble 
that  any  particular  word  is  not  adjacent  to  any  other  word  in  its  code0 
This  probability  that  the  first  letter  of  one  word  is  adjacent  to  the  first 
letter  of  a  second  word  is   <W    ^P±Pjf  since  this  sums  the  cases  of 
adjacency  with  coefficient  1  and  those  of  non-adjacency  with  coefficient 
Oo    The  probability  that  t^o  words  are  adjacent  in  all  letters,  and  there= 
fore  adjacent  as  words,  is  (  S   AiiPiP^)n0    The  probability  of  non-adja- 
cency is  therefore    1  -(  |J   A-yP^Pj)1*,,    The  probability  that  all  LI  -1 

other  words  in  a  code  are  not  adjacent  to  a  given  word  is.  since  they  are 

-  n-iH  -1 

1  A-yPiPj)  ,  which  is,  by  a  well  known 

inequality,  greater  than  l-(M-l){^j    AijPiPj)*1*  which  in  turn  is  greater 

than  1  =  M  (  ^    ^ij?iPj)nc    If  we  set  M  -  (1  ^e)n(^    ^ij^i^j)^  ^ 


then  have,  by  taking  e  small,  a  rate  as  close  as  desired  to  -log  A.  .P.I 

Furthermore,  once  6  is  chosen,  by  taking  n  sufficiently  large,  wc  can 
insure  that  M(  ^   Aijpipj)n  is  as  sm9.11  as  desired,  say,  less  than  6. 
The  probability  in  the  ensemble  of  codes  of  a  particular  word  being 
adjacent  to  any  other  in  its  own  code  is  nc./  less  tfcan  60    This  implies 
that  there'  are  codes  in  the  ensemble  for  wliich  the  ratio  of  the  number  of 
such  undesired  words  to  the  total  number  in  the  code  is  less  than  or  equal 
to  60    Far,  if  not,  the  ensemble  average  would  be  worse  than  6,  Select 
such  a  code  and  delete  from  it  the  wards  having  thi3  property*   We  have 
reduced  our  rate  only  by  at  most  log(l  -6  J""1,    Since  e  and  6  were 
both  arbitrarily  small,  we  obtain  error-freo  codes  arbitrarily  close  to 
the  rats  -log  ^ax  ^   A^P^Pj  as  stated  in  the  theorem., 

Far  simple  channels  it  is  usually  more  convenient  to  apply  particular 
tricks  in  trying  to  evaluate  CQ  instead  of  the  bounds  given  In  thiii  theorem 
which  involve  maximizing  and  minimizing  processes  „    The  simplest  loi.er 
bound,  as  mentioned  before,  is  obtained  by  merely  finding  the  logarithm 

of  the  maximum  number  of  non=adjacent  input  letters  0 

. 

A  useful  device  for  establishing  an  upper  bound  depends  upon  the 
adjacency  graph  for  the  input  symbols »    Suppose  two  vertices  a  and  b  of 
this  graph  have  the  property  that  they  are  connected  together  and  every 
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vertex  that  a  is  connected  to,  b  is  also  connected  to  (but  not  necessarily 
conversely) .    Then  vertex  b  and  all  lines  connected  to  b  nay  be  eliminated 
from  the  graph,  leaving  an  adjacency  graph  for  channels  with  the  samp 
zero  error  capacity „    This  cay  be  proved  by  constructing  from  any  error- 
free  code  for  channels  with  the  first  graph  an  error-free  code  with  the 
sane  number  of  words  for  the  second  graph.    This  is  done  by  replacing  in 
all  words  of  the  first  code  the  letter  for  vertex  b  wherever  it  occurs 
by  the  letter  for  vertex  aQ    This  does  not  change  adjacency  relations 
among  words  since  a  is  adjacent  to  no  points  that  were  not  already  adjacent 
to  b0 

Another  device  which  is  useful  for  finding  upper  bounds  is  that  of 
eliminating  lines  in  the  graph „    EiiixLnating  one  or  more  lines  in  a  graph 
can  only  increase  or  leave  constant  Co,  since  any  zero-error  code  for  the 
old  channel  will  be  zero-error  for  the  new  channel.    By  careful  choice 
of  one  or  more  linos  to  eliminate,  the  graph  may  be  reduced  to  one  for 
which  Cq  is  readily  evaluated,  and  if  this  Cq  equals  the  lower  bound 
found  by  choosing  a  subset  of  non-adjacent  letters,  then  this  gives  the 
zero-err cr  capacity 0 

3,  as  well  as  others,  may  be  described  in  more  general 
an  adjacency-reducing  mapping .    Suppose  that  we 
can  find  a  mapping  of  letters  into  other  letters,  i-*a(i).  with  the  pro- 
perty that  if  i  and  j  are  not  adjacent  in  the  channel  (or  graph)  then 
a(i)  and  a(j)  are  not  adjacent 0    If  we  have  a  zero-error  code,  then  we 
nay  apply  such  a  mapping  letter  by  letter  to  the  code  and  obtain  a  new 
code  which  will  also  be  of  the  zero-error  type,  since  no  adjacencies  can 
be  produced  by  the  mapping .    If  all  of  the  letters  i  are  mapped  into  a 
subset  of  the  letters,  no  two  of  which  are  adjacent,  then  it  is  easily- 
seen  that  the  zero-error  capacity  of  the  original  channel  is  the  logarithm 
of  the  number  of  letters  in  this  subset*    For,  in  the  first  place,  by 
forming  all  sequences  of  these  letters  we  obtain  a  zero=error  code  at  this 
rate.    Secondly,  any  code  in  the  channel  can  be  mapped  into  a  code  using 
only  these  letters  and  containing,  therefore,  only  2Con  non-adjacent  words 0 

The  capacities,  or,  more  exactly,  the  equivalent  numbers  of  input 
symbols  for  all  graphs  up  to  five  vertices  are  shown  in  Fig.  U°    These  can 
all  be  found  .readily  by  the  tricks  mentioned  above,  excepting  the  channel 
of  Figc  2  mentioned  previously,  for  which  we  know  enly  that  the  aero-error 
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capacity  lies  in  the  range  ^  ioS  5  £  Cq  <  log  |  , 

All  graphs  T7ith  six  vertices  have  been  examined  and  the  capacities 
of  all  of  these  can  also  be  found  by  these  devices.,  with  the  exception  of 
fouro    These  four  can  be  given  in  terms  of  the  capacity  of  Pig.  2,  so  that 
this  latter  graph  is  essentially  the  only  unsolved  problem  up  to  seven 
vertices „    Graphs  with  S6ven  vertices  have  not  been  completely  examined 
but  at  least  one  new  situation  arises,  the  analog  of  Fig0  2  with  seven 
input  letters „ 


r 
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Theorem;    If  two  channels  have  zero=error  capacities  C    end  C  ,  their 

°       r  0      i  i!  ~i 

sum  has  a  -ero=error  capacity  greater  than  or  equal  to  log   exp(C  )+exp(C  ) 
and  their  product  a  aero-error  capacity  greater  than  or  equal  to  Co  +  Co„ 
If  the  graph  of  either  of  the  two  channels  can  be  reduced  to  non-adjacent 
points  by  the  mapping  method^  then  these  inequalities  can  be  replaced 
by  equalities o 

Brogf i    It  is  clear  that  in  the  case  of  the  product,  the  zero-error 

a  n 

capacity  is  at  least  C  +  C  ,  since  we  nay  fcrm  a  product  code  from  two 

codes  which  are  close  to  c'  and  C  ,    If  these  codes  are  not  of  the  same 

o  o 

length,  we  use  for  the  new  code  the  least  common  multiple  of  the  indi- 
vidual lengths  and  form  all  sequences  of  the  code  words  of  each  of  the  codes 
up  to  this  length ,    To  prove  equality  in  case  one  of  the  graphs, say  that 
for  the  first  channel,  can  be  mapped  into  non^adjacent  points,  suppose 
we  have  a  code  for  the  product  channel*    The  letters  for  the  product  code? 
of  course,  are  ordered  pairs  cf  letters  corresponding  to  the  original 
channel ,    Replace  the  first  letter  in  each  pair  in  all  code  words  by  the 
letter  corresponding  to  reduction  by  the  mapping  method.    This  reduces 
or  preserves  adjacency  between  words  in  the  code0    Now  sort  the  code 
words  into  An  subsets  according  to  the  sequences  of  first  letters  in  the 
ordered  pairs „    Each  of  these  subsets  can  contain  at  most  Bn  members, 
since  this  is  the  largest  possible  number  of  codes  for  the  second  channel 
of  this  length o    Thus,  in  total,  there  are  at  most  A1^"  words  in  the  code, 
giving  the  desired  result e 

In  the  case  of  the  sum  of  the  two  channels,  we  first  show  how,  from 
two  given  codes  for  the  two  channels,  to  construct  a  code  for  the  sum 
channel  with  equivalent  number  of  letters  equal  to  A1™5  +  B1"6,  where 
6  is  arbitrarily  small  and  a  and  B  are  the  equivalent  number  of  letters 
for  the  two  codes,,    let  the  two'  codes  have  lengths  ^  and  n2„    The  new 
^8  will  have  length  n  where  n  is  the  smallest  integer  greater  than  both 
6~  and  IT  °    Wot  £arm  cods8  £ar  the  first  channel  and  for  the  second  channel 
for  all  lengths  k  from  zero  to  n  as  follows  „    Let  k  equal  ari^  b,  where 
a  and  b  are  integers  and  b  <  n^o   We  form  all  sequences  of  a  words  from 
the  given  code  for  the  first  channel  and  fill  in  the  remaining  b  letters 
arbitrarily,  say  all  with  the  first  letter  in  the  code  alphabet.    We  achieve 
at  least  A  ~5n  different  words  of  length  k  none  of  which  is  adjacent  to 
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any  other e    In  the  same  way  we  forn  codes  for  the  second  channel  and 

achieve  Bk  ~  5n  words  in  this  code  of  length  kc    V.'e  nor/  intermingle  the 

k  code  for  the  first  channel  with  the  n-  k  code  for  the  second  channel 

in  all  (k)  possible  ways  and  do  this  for  each  value  of  k„    This  produces 

a  code  n  letters  long  ?;ith  at  least    £±    (J)  Ak°  06  Bn~k~nS  -  (AB)**n(A  B)n 

different  words c    It  is  readily  seen  that  none  cf  these  different  words 

are  adjacent „    The  rate  is  at  least  log  (a  +  E)  -6  log  AB,  and  since  6 

was  arbitrarily  small,  we  can  achieve  a  rate  arbitrarily  close  to  log  (A  +B)<> 

To  shorr  that  it  is  not  possible, when  one  of  the  graphs  reduces  to 
non=adjacent  points,  to  exceed  the  rate  corresponding  to  the  number  of 
letters  A  +  B,  consider  any  particular  code  of  length  n  for  the  sum  channel 0 
The  words  in  this  consist  of  sequences  of  letters  each  corresponding  to 
one  or  the  ether  of  the  two  channels „    Tne  words  may  be  subdivided  into 
classes  corresponding  to  the  pattern  of  the  choices  of  letters  between  the 
two  channels o    There  are  2    such  classes  with  (£)  classes  in  which  exactly 
k  of  the  letters  are  from  the  first  channel  and  n-k  from  the  secondo 
Consider  now  a  particular  class  of  words  of  this  type0    Replace  the 
letters  from  the  first  channel  alphabet  by  the  corresponding  non-adjacent 
letters  o    This  does  not  harm  the  adjacency  relations  between  words  in  the 
codec    Now,  as  in  the  product  case,  partition  the  code  words  according  to 
the  sequence  of  letters  involved  from  the  first  channel »    This  produces 
at  most  A     subsets o    Each  of  theBe  subsets  contains  at  most  Bn  =  k  members, 
since  this  is  the  greatest  possible  number  of  non-adjacent  words  for  the 
second  channel  of  length  n=  k0    In  total,  then,  summing  over  all  values 
of  k   and  taking  account  of  the  (£)  classes  for  each  k,  there  are  at  most 
^  (£)  Ak  Bn~k  *  (A  ♦  B)n  words  in  the  code  for  the  sum  channel,.  This 
proves  the  desired  result «    We  conjecture  but  have  not  been  able  to  prove 
that  the  equality  of  thi3  theorem  holds  in  general,  not  merely  under  the 
conditions  given. 

Theorem i    In  any  code  of  length  n  and  rate  R  >  Co,  Gc  5»  0,  the  probability  of 
error  Pe  will  satisfy 

Where  pmin  is  the  mininiulB  non-vanishing  pi(j)o    Thus  far  R  »  C  ,  E(R)  £  -  log  p  .  „ 
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Proof:   By  definition  of  C    there  are  not  more  than  enC°  non-adjacent 

o  „p 

words  of  length  n<,   With  R  >  C^,  among  e  '  words  there  must,  therefore,  be 
an  adjacent  pairc    The  adjacent  pair  has  a  common  output  word  which  either 
can  cause  with  a  probability  at  least  p"_.  n ,    This  output  word  cannot  be 
decoded  into  both  inputs 0    At  least  one,  therefore,  must  cause  an  error 
when  it  leads  to  this  output  wordo    This  gives  a  contribution  at  least 
6       pmin  t0  the  probability  of  Pe»    Now  omit  this  word  from  consi- 

deration and  apply  the  same  argument  to  the  remaining  enR  - 1  words  of  the 
codec    This  will  give  another  adjacant  pair  and  another  contribution  of 
error  of  at  least  e  ^  p"^  „    The  process  may  be  continued  until  the 
number  of  code  points  remaining  is  just  enC°„    At  this  time,  the  proba- 
bility of  error  must  be  at  least  <erJl  -enC°)e",,R  p^  or  the  expression 


given  in  the  the  or  em , 


"|U^    |/M).-  ff^f]  -Ma"^  MtM+$ 

;  a/Cap) 


»  • 

0  * 


FIG.  ii 

All  graphs  with  ij  2,  3,  lij  0  nodes  and  the  corresponding  Hq  for  channels 
with  these  as  adjacency  graphs  (note  CQ  -  log  NQ) 
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Lower  Bound  for  Pgf  for  a  Completely  Connected  Channel  with 
Feedback  "  ~"  '  ■ — - 

Theorem:     ?,f  ^  (  1  -  ^  )  wnere  M  is  the 

of  messages,  the  channel  is  assumed  completely  connected,  anci 
pmin  18  tiie  minimum  transition  probability.    Note  if  H£o 
-mm 

Proof:     Choose  any  two  messages  m  and  m' .     (if  there 
is  only  one  message,  the  theorem  is  trivially  true.)  Let 

and  a^i  be  the  first  transmitted  letters  for  m  and  n«. 
Since  the  channel  is  completely  connected,  2.  and  *  *  have 
a  common  output  letter,  say  y^     Determine  the  second  trans- 
mitted letters  for  m  and  m'  if  ^  is  received  and  let  these 
be  22  and  *2< .     These  must  have  a  possible  common  received 
letter  yg.    Find  the  third  transmitted  letters  for  m  when 
yxy2  was  received  and  for  .>  when  y  y    was  received.  let 
these  be  ^  and  i^.    Continue  this  process  to  give  a  re~ 
ceived  sequence  y;,  y2,  . . . ,  ^  vvMch  m±ght  occur  ^ 

m  or  m«.    Each  could  cause  this  sequence  with  probability 
greater  than  or  equal  to  p^.     At  the  receiver  this  sequence 
must  be  decoded  in  an  unique  way,  hence    one,  at  least,  of 
m  and  m«  would  be  decoded  incorrectly  if  it  caused  this  re- 
ceived sequence.     Say  this  is  m  -  then  .  can  cause  errors 
to  the  amount  at  least    1  .    Now?  eliminating  ffi  fpom 

further  consideration,  tBke  any  pair  of  messages  from  the 
remaining  M  -  1  (including  «•),    The  same  argument  may  be 
applied  to  this  pair  to  give  a  second  source  of  error,  die- 
joint  to  the  first,  to  the  amount    1      p  J    .    Continuing  in 
this  way,  we  can  arrive  at  11  -  1  dlfcoint  sources  of  error, 
each  at  least    1      p^,  a  total  of  at  least  (  1  .    i    )  p  n 
proving  the  theorem.  ~  min' 

A  Lower  .Bound  for  P    when  R  >  c 


Theorem:    For  any  code  with  rate  R^c,  R  -  c  =  X  with 
bloclt  length  n>  2  log  2.  we  have 

6 


Pe  £  -  <L 


4  (  E  -  leg  p  .  ; 

to  '  min  ' 


3/t, 


Hence  for  any  fixed     >  0,  Pg  is  bounded  away  from  zero. 
Proof: 

Pe   ^   1/2  yO  (R  -  _1_  log  2  ) 

n 

»  1/2  *  (B  -    6  ) 

2 

=    1/2  ^ (0  +  __j  ) 

2 

For  any  pair  of  code  words  (u,v)  such  that  p(u,v) >  0, 
the  mutual  information  I-       .  satisfies 


It  1  PU(V)  - 

~  I(ufv)    =    ~  l0£    FTvT™  9   l0g  ?min  -  -i-  los 

n 


l0g  pmin 


Now  whatever  distribution  p(u)  is  used,  the  mean  of    1  I, 

—  (u,v) 

is  less  than  or  equal  to  C  (by  the  very  definition  of  C). 


we  have  a  distribution  function  yo(I)  which  is  zero  for  I<  log  p 
and  whose  mean  is  less  than  or  equal  to  C.    This  implies  a  lower11111 
bound  on  p  (C  +    &     ).     In  fact,  we  must  have  p  (G  +    h     )  greater 
than  or  equal  to        6/2  ,  for  if  not,  th^mean  of  the 

C  +  S/2  -  log  Pm.n 
distribution  would  be  greater  than      p  (C  +  <5/2)log  p^ 

+  [C  +  6/2]  [l  -  ^(c  +  S/2J    =    log  p. 

C  +  S/2  -  log  Pmin 

+  (C  +  5/2)      C  -  log  c. 
C  +  S/2  -  log  p. 


nan 


This  is  a  contradiction  and  conseauently  P    >,  1/4  g 

8  (TT!7?-=  log  p  . 

?     1/4  j  min 

R      log  Pmin 


A  Lover  Bound  for  P 
- — ■  — •  e 


Vie  will  say  that  the  input  letters  in  a  channel  are  uniform  if  each 
of  these  letters  has  the  sane  set  of  values  for  transition  probabilities 
to  output  letters  (not  necessarily  to  the  sane  output  letters) „    in  the 
P^Cj)  zjatris  each  row  is  sobs  rearrangement  of  the  lumbers  in  the  first 
row.    If  this  is  true,  it  is  clear  that  the  transistion  probabilities 
for  v;crds  of  length  n  in  this  channel  hare  the  same  property.  In 

fact,  the  transition  probabilities  from  a  particular  input  v;ord  -ill  consist 
of  the  r    products  that  can  be  formed  from  the  r  transition  probabilities 
for  the  original  channel  taken  n  at  a  time  with  repetition  allowed „ 
Suppose  that  when  these  r"  transition  probabilities  are  arranged,  in  order 
of  decreasing  value  that  the  total  probability  after  element  number  d 
is  Q  fc  Wu(d}0 

Theorem;    In  a  channel  with  uniform  input  letters,  r  output  letters  and 
the  function  Qn(c),  any  block  cods  of  length  n  and  rate  it  has  a  probability 
of  error  ?e  satisfying 

where  the  brackets  denote  the  integer  oarto 

Jroof  i    Suppose  we  have  given  a  code  with        words.    The  probability  of 
not  making  an  error,  1  -  Fg,  may  be  computed  by  taking  the  probability 
of  use  for  each  word,  e     ,  and  multiplying  by  the  sum  of  the  transition 
probabilities  from  that  word  to  all  output  trords  which  are  decoded  as  the 
given  wordo   When  summed  over  all  input  words  in  the  code,  this  gives 
1  -  Peo    Thinking  in  terms  of  the  matrix  of  word  transition  probabilities, 
this  means  that  a  certain  selected  set  of  entries  from  each  row  is  added 
together  and  the  final  result  multiplied  by        „    The  total  number  of 
entries  added  in  all  the  different  rows  is  exactly  equal  to  rn  since  this  - 
is  the  total  number  of  output  words  and  each  is  decoded  into  exactly  one 
input  word-    The  sum  of  elements  in  a  particular  row  is  .increased  or 
unchanged  if  we  take,  in  place  of  the  given  elements,  the  same  number  of  1 
elements  chosen  in  order  of  decreasing  value „    Because  of  the  assumption 
of  uniform  inputs,  all  the  rows  have  the  same  sequence  of  values  when 
arranged  in  monotone  decreasing  order «    Thus  our  first  operation  has 
served  to  give  us  the  sum  of  e      (one  for  each  row)  beginnings  of  this 
sequence  of  various  lengths „ 
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If  any  tv:c  of  the  rorrs  have  different  numbers  of  elements  added 
into  the  am,  tre  can  again  increc.se  or  leave  unchanged  the  total  by 
equalizing  (as  nearly  as  possible)  the  nusber  cf  tea  from  the  too  rcrr;s, 
since  this  replaces  smaller  valued  terns  by  larger  ones  >    i-rocseding  in 
this  manner  v;e  increase  or  leave  constant  the  sua  while  holding  the  total 
nunber  of  terns  at  exactly  r  %    V/hen  the  equalization  of  number  cf  entries 
frcn  rows  has  proceeded  as  far  as  possible,  the  nuaber  in  each  row  will  be 
v:ithin  one  of  rp/eBftt,    More  precisely,  let  vn/f*  equal  A^E/e3"  where 
A  ana  B  are  integers  and  B  <  etoo    7-«cn  B  of  the  roas  trill  have  A  +  1 
terms  arid  the  remaining        =  B  will  have  A  terms  „    We  will  then  have 

1-Pe£  [B(l«q(A+  1))+  (eRa  -  B)(1»Q0.))] 

Fp  2  e^n  BQ(A  +  1)  +  U-e"81*  B)Q(A) 


e 

>Q(A+  1) 

-  «(  [ea(1°er  -R>+i]  } 


(See  next  page) 
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Icsrcr  Bound  with  One  Type  of  Input  and  Buy  Typos  of  Output 

The  inequality  us  have  proved  holds  in  any  case  «fccre  the  inputs  age 
uniform    However,  it  cay  be  strengthened  in  certain  cases c    Suppone  that 
the  input  letters  (cr  words)  are  uniform  in  the  sense  previously  defined 
and  that  the  output  letters  (or  words)  can  be  partitioned  into  a  number 
of  subsets  S^,  Sg,  ,0,       with  the  foilstefiag  property.    Each  input  letter 
(cr  word)  has  the  sane  set  of  transition  probabilities  loading  to  words 
in  5^  as  any  other  input  word,  for  each  i0    Thus,  the  channel  looks  uniform 
•  for  output  words  when  only  the  input  v  ords  and  output  words  in  any  parti- 
cular S±  are  considered.    Let       be  tie  number  of  output  uords  in  £  e 
Ict  bo  the  probability  in  the  tai]  fee-  5  analogous  to  the  Q(d)  of 

the  preceding  thocreja0    Thus      (d)  is  the  total  probability  after  C  els- 
nents  in  the  monotone  decreasing  ordered  sequence  of  all  probabilities  froc 
an  input  word  to  the  output  words  in  Sj  B 

We  nay  argue  precisely  as  we  did  before  for  each  particular  SA  end 
obtain  a  lower  bound  for  the  probability  of  errors  occurring  with  received 
signals  in  the  set  S±o    The  total  probability  of  error  ?e  is  greater  than 
or  equal  to  the  sum  on  i  of  these  individual  contributions c 

Pe*  ?  VV"**«  ' 

A  more  general  case  may  be  defined  as  follows.    Suppose  the  input 
words  can  be  partitioned  into  subsets  T±,  ?2,  0..,Tc  and  the  output  words 
into  subsets  S^,  Sg,  ...jS^  and  the  channel  is  uniform  in  transitions 
from  input  set  ?±  to  output  set  S..,  that  is,  every  nusfcor  of  ?±  has  the 
sane  array  of  transition  probabilities  to  members  of  SjC    It  is  aiwavs 
trivially  possible  to  perform  such  a  partitioning  by  placing  all  input 
letters  in  different  subsets  and  all  output  letters  also  in  different 
subsets  o    Uore  significantly,  if  we  consider  words  of  length  n,  re  may 
perform  this  partitioning  by  subdividing  the  input  words  into  subsets 
according  to  their  composition  in  terms  of  letters.    Thus,  if  the  letters 
in  the  channel  are  a,  b,  g,  a  composition  is  defined  by  a  set  of 

integers  n&i  n^  „..,n   whoss  sum  is  nc    All  words  with  exactly  n  a»s. 
r^  t>v3,  ooo,  ng  g»s  will  be  placed  in  the  corresponding  input  class . 
This  class  would  then  have  nl/n&l  n^l  ..<>  ng*  members  „    In  an  exactly 
similar  way  the  output  words  of  length  n  can  be  partitioned  into  composi- 
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tier?' in.  terns  of  output  letters,.    It  is  immediately  seen  that  eaeh  word 
in  a  partici-Oar  input  class  has  the  same  transition  probabilities  to  a 
certain  output  class  as  any  other  word  in  the  sane  input  class  0    Thus  this 
decomposition  is  of  the  type  we  are  considering „ 

We  return  now  to  the  calculation  of  a  loricr  bound  for  P  f  v:ith  a 

nn  e 

given  number  of  code  words  V  -  e    •    Our  procedure  is  similar  to  that  used 
previously;  we  23erfcrni  operations  which  reduce  (or  leave  unchanged)  the 
probability  of  error  and  arrive  eventually  at  an  easily  computed,  value, 
Suppose  a  given  code  has  IL  members  in  input  class  T.  {i  -  1,  2t  0  0  0,  ©)c 
Let       as  before  be  the  total  number  of  words  in  output  class  S,,  and  let 
Q^(d)  be  the  total  probability  in  the  tail  beyond  entry  d  when  the  transi- 
tion probabilities  from  a  member  of  set  T.  to  the  words  in       arc  arranged 
in  a  monotone  decreasing  sequence 0    There  will  be  errors  in  output  set 
S    at  least  to  the  amount  min  Q.  .(N.e^1"'  +  X)*    This  is  true  sir.ee  we  may 
reduce  the  probability  of  error  by  equalizing  the  tails  as  before  for  all 
words  from  the  same  input  class  „    Then  one  may  again  reduce  or  leave 
unchanged  the  probability  of  error  by  replacing  \7crds  from  other  input 
classes  by  that  which  minimizes  the  expression o    The  details  are  simple 0 
The  total  Pe  can  be  bounded  from  beloer  by  summing  this  over-all  output 
class : 

Another  lower  bound  can  bo  obtained  by  a  slightly  different  argument 0  " 
If  there  are  e  input  classes  and  e     input  words,  there  must  be  a  class 
with  at  least  e^/c  input  words ,    If  class  i  contains  this  many  words, 
the  probability  of  error  will  be  bounded  at  least  by  ^{IMS*"***  1) 

8inse  the  situation  is  that  covered  by  the  uniform  input  result „    If  wo 
minimize  this  on  i,  then  we  will  certainly  have  a  lower  bound  for  P 
regardless  of  which  class  contains  the  e^/c  or  more  code  words „  Thus 


A  somewhat  stronger  but  more  complex  lower  bound  on  P    can  be  obtained 

e 

by  a  still  different  variation  of  these  arguments. 
Let 

"  Probability  in  the  tail  of  the  monotone  sequence  of 
transition  probabilities  from  input  set  i  to  output 
set  j,  the  tail  consisting  of  probabilities  less  than 
P  in  value. 

■  total  number  of  terms  in  this  sequence  with  pro- 
babilities greater  than  or  equal  to  P. 
a.         m  total  number  of  wor'-s  in  output  set  i. 
Then  we  will  show  that  the  probability  of  error  F  satisfies 

pe  y,  ^^ssA  v  (V  a) 

Where  the  P..  satisfy 

»d«flfc*i       \.    (fy  (2) 

The  argument  here  is  similar  to  those  before.   Tie  assume0^  M  messages 
coded  into  input  set  iD    To  obtain  the  maximum  probability  in  the  parts  of 
the  tails  of  the  distributions  they  should  be  equalized  as  nearly  as  pos- 
sible to  end  at  the  same  value  of  probability  for  the  last  term  taken. 

Whilfi  this  equalization  will  not,  in  general,  come  out  eve",  a  value  P . 

•J 

satisfying  (2)  will  be  small  enough  that  all  the  tails  of  the  different 
sequences  beyond  this  P..  will  cause  error  *?ter  the  nearest  possible 
equalization.    Thus,  Pq  will  have  a  lower  bound  given  by  (1).    The  mini- 
mizing, of  course,  takes  account  of  the  most  favorable  possible  way  of 
dividing  the  M  messages  among  the  input  classes. 


Application  of  "Sphere-packing"  Bounds  to  Feedback  Case , 

In  the  uniform  input  case,  the  lower  bounds  on  the  probability  of 
error  based  on  the  sphere-packing  type  of  argument  apply  also  -o  memory- 
less  discrete  channels  which  have  a  feedback  link  giving  information  at 
the  transmitter  concerning  the  previous  received  letter. 

To  shew  this 3  suppose  we  have  such  a  uniform  input  case  where  the 
input  letters  all  have  the  same  set  of  transition  probabilities  going  to 
output  letters.    Suppose  we  have  a  block  code  for  the  feedback  system  of 
length  a.    This  means  that  at  the  transmitting  point  there;  is  a  device  with 
two  inputs 5  or,  mathematically,  a  function  with  two  arguments. «    One  arru- 
mer.t  is  the  message  to  be  transn:.;w.ed,  the  other,  the  pas;;,  received  Istiers 
(which  have  come  in  ever  the  feedback  link).    The  value  of  the  function  is 
the  next  letter  to  be  transmitted „    Thus,  the  function  may  be  thought  of 
M  V  1  "  f'k'  V  where  x^  +  1  is  the  j  +  1  transmitted  letter  in  a 
block,  k  is  an  index  ranging  from  one  to  e    ,  and  represents  the  specific 
message,  and  v    is  a  received  word  of  length        Thus  j  ranges  from  6  to 
n  -  1  and  v    over  all  received  words  of  these  lengths,. 

In  operation,  if  message       is  to  be  sent  f  is  evaluated  for  f  (k— ) 
where  the         means  "no  word"  and  this  is  sent  as  the  first  transmitted 
letter.    If  the  feedback  link  sends  back^,  say,  as  the  received  letter,  the 
next  transmitted  letter  will  be  f(k,4).    If  this  is  received  as  p.  the  next 
transmitted  letter  will  be  f (k,*,p),  etc. 

Remembering  our  asrumption  about  uniformity,  the  first  transmitted  letter 
for  any  message  gives  rise  to  a  set  of  received  letters  with  probabilities 
qi*  q2'  ~  "       %  (these  beinS  the  transition  probabilities  from  any  letter). 


In  each  cs.se,  (that  is.  each  (message,  received  letter)  pair),  £  second 
transmitted  letter  is  determined  by  the  function  f .    Since  the  3  j  tiers  are 
uniform,  each  gives  rise  to  a  secori  set  of  letters  r,-ith  probab:  titles  , 
^2'  "  "  Ths  probabilities  are  the  sane  in  all  cases  alth:-ush  the 

letters  t<  which  they  apply  cay  differ.    Thus,  for  each  massage  .hoice  i^. 
there  exi:  ts  a  set  of  possible  received  Ivo-letter  sequences  ~iJ     the  same 
set  of  probabilities,  namely,  all  pairs  c.  q..    Continuing  in  tJ  Is  man* r. 
each  message  a^.  when  fully  transmitted  gives  rise  at  the  receive :  to  a  ret 
of  possible  received  words  of  length  n  vrith  the  earns  array  of  probabilities 
(regerdlc, .  of  the  partievlar  message  or  the  particular  noise).    Ihcse  pro- 
babilitie;  are  the  set  of  all  nth  degree  products  of  teres  from  \?f  q     -  - 
qt. 

At  the  receiver,  a  received  word  must  be  decode?!  in  an  jaaic  n  way. 
^he  probability  of  error  when  messege       is  transmitted  is  the  :  :t  of  the 
above-mentioned  transition  probabilities  to  all  words  of  length  a  which  are 
not  decoded  as  ix.    if  a±  received  words  are  decoded  ss  message  z:±,  the:; 

aA  "  the  to* e.1  nunter  of  different  received  words  of  len  :h  n.  If  the 
transition  probabilities  are  arranged  in  monotone  decreasing  crtlr,  the  prob~ 

ability  of  errors  for  message       is  greater  than  or  equal  to  t'  -j  sua  of 
terns  in  this  decreasing  sequence  after  term  a,,  since  the  sum  c:f  the  first 
fti  terns  cf  a  monotone  decreasing  sequence  ovsrbounds  the  sua  o,  any  ot  ier 
Ei  terms.    Thus,  our  estimate  of  P£  is  decreased  by  taking  the  .  Irst  a,  • 
terms  for  each  message  m.„ 

Sine.,  the  sequences  for  the  different  messages  m^  are  ac'.erlly  ths 
same,  it  is  again  decreased  by  equalizing,  as  nearlj  as  possible ,  the  af- 
ferent a. t    This  gives  the  simplest  lover  bound  on  r  „ 
-1-  e 


The  more  involved  and  sharper  result,  where  the  different  classes  of 
received  words  are  considered,  follows  by  essentia ly  the  same  argument, 
on  noticing  that  each  transmitter  choic  >  gives  ri  .o  to  the  different  q.,, , 
transition  probabilities  and  that  the  equalizat.cn  may  be  carried  out 
•within  these  classes  as  before,  altrays  raducirg  the  estimate  of  Pgo 

While  it  seems  likely  that,  the  more  geraral  results,  where  the  input 
letters  are  not  uniform,  (or  slight  modifications  of  these  results)  hold 
for  the  feedback  case,  no  proof  has  bee:,  found.    There  is$  indeed,  some 
extra  difficulty  here  because  the  trar:  jmitter  can  mm  take  positive  and 
useful  action  depending  on  the  result  at  the  receiver  of  earlier  parts 
of  the  message •    In  the  uniform  inp it  case,  no  very  significant  action  is 
possible,  since  all  the  letters  a^e  statistically  alike  so  far  as  the 
sphere-packing  properties  are  concerned. 


Theorem;    Suppose  in  a  channel  words  of  length  d  can  be  partitioned  into 

ee3-   completely  connected  subsets  and  we  have  given  a  code  of  length  n  +  d 

with  M  words  and  with  probability  of  error  F    0    Then  we  can  construct  a 

code  of  length  n  with  at  least  vr  K  e    1   words  and  with  probability  of 

error  Pes  S  2  p^  PeL,  where         is  the  smallest  (nonvenishing)  p.  (j) 

for  the  channel,,    If  CQ  *  0  fcr  the  channel  we  can  construct,  -ore  strongly, 

the  code  of  length  n  with  U  words  and  probability  of  error  p     <-  D~d  p 

"es  -  ^min  AeL& 

Corollary:    Let  (IL^)  be  any  point  on  the  reliability  curve  for  a  channel . 
Construct  the  straight  line  through  this  point  and  the  point  (C,  log  p""1  ). 
The  reliability  curve  Ilea  belor;  or  on  this  straight  line  for      <  E  <  R_ 
and  above  or  on  it  far  R  '-  R^0    In  particular,  it  lies  below  the  line 
segment  joining  (C^,  lc/  p^)  and  (C,0)  where  0  is  the  capacity  of  the 
channel „ 

Proof:    We  will  refrr  to  the  given  code  of  length  n  +d  as  the  long  code 
and  codes  of  lengtb  n  derived  from  it  as  short  cct.es  »    For  sinolicitv 
we  will  first  cor.?ider  the  case  where  Cq  -  0„    The  short  code  is  then 
obtained  by  merely  deleting  the  last  d  letters  of  iiach  of  the  words  in 
the  long  codec    Thus,  in  the  long  code,  let  us  designate  the  words  by 
VD1<  W  W 

These  ?ords  correspond  to  the  M  different  messages  and  some  of  them 
may  consist  of  the  same  sequence  of  input  letters  (although  in  general  for 
a  good  code,  this  would  not  be  the  case). 


The  short  cede  consists  of  the  words        fos      ,«  jfc3  deccdin? 

process  for  the  shcrt  code  trill  be  rarcLaaB  likselihoi.    Thus,  if  the  re- 
ceived wards  corresponding  to  the  chert  code  ore  V.  ,  \"s  t„0y  and  V. 
is  received,  it  is  decoded  as  fcj&t  T.  *ith  sr^ii^conditiorial  probability 
givftft  V^c    Since  the  T\  ore  usee;  with  equal  jrcfeabilit?-,  thic  is  the  T. 
whose  probability  of  causing  V,  is  c  majctea,    r;6  no;,  chaa  that  the  pro- 
bability of  error  in  the  short  "We  when  &  particular  v>  say  V,,  is  receive! 
is  2ess  than  or  equal  to         multiplied  by  the  corresponding  probability 
for  the  long  code,  where  we  oust ,  of  course,  consider  oil  tee  possible 
received  signals  W^.  K'2>  »,,4  corresponding  to  the  U  port  of  the  long 
ecde«,    let         be  the  BQxinuE  likelihood  detection  for  the  short  code  when 
?«  i£  received,  and  let  U*  be  the  U  oart  of  the  ion?:  cede  for  T„,  r 
Also  let  the  long  cede  decoding  system,  when  \\  end       is  received,  decode 
it  es  the  message  ?ffc>,  that  is,  decide  that  £,  ■  was  transmitted. 

Since  Co  -  0,  each  pair-  of  words  of  length  d  fca*«  a  wsibis  cex&xi 
received  word.    In  particuLsr,  for  each  k.  and  U.r  have  a  E  in  coEmon. 

Hence  we  can  find  a  set  of  ;7 -s,  ^  1^,   ^  such  that  each  one  is  a 
possible  result  of        and  one  or  znor'e  of  the  U^,  and  every  U  has 
sons  W  in  the  subset  as  a  possible  result,,    Now  the  error  in  the 'short 
code  (when       is  received)  is  given  by  -    gjl     %  (St),  that  is, 

the  probability  of  all  other  transmitted  words  except  the  ssa&suE  likeli- 
hood  one  (conditional  on  the  received  V  ).    Consider  the  probability  of 
error  for  the  long  code  when  V  ^  is  received  and  in  particular  those  errors 
resulting  when  U.  or         is  received  as  Wa  (say),  r  being  the  W  common  to 
Ui  and  UML°    *jither  of  Ui  <*  Uia  c^n  cause  V&  with  probaoility  greater  than 
or  equal  to  p^.    Whether       is  decoded  as  i  or  KL  (or  some  other  way), 
errors  will  occur  with  probability  2  p^  ^(\)  sinoe  ^(U^)  >  Py  (u,), 
(since  U     was  the  mazinum  likelihood  U) .    If  there  are  several  U  ss 
leading  into  Wq,  we  will  again  have  errors  caused  with  probability  at  least 
pmin  ^  Vj*V*  SU2E9d  CVBr  this  set  of  1,  since  if  W    is  decoded  as  one 
of  the  Ui,  the  larger6  P^CU^)  takes  its  place.    In  total,  then,  summing 
over  all  the  W 's  in  our  selected  subset,  we  get  a  total  probability  of 

error  for  the  long  cede,  when  V,  is  received,  pd  fWu\)  >  pd     P  , 

«J  oin    i/ML     vy  i  -  ^min  es* 

bumming  this  inequality  ever  all  V.  with  appropriate  probabilities  for  the 

Vy  Ve  chtain  the  desired  result 


P  :  p"?  P  T 
es  -  -  rain  eL 


3q 


No?  consider  -one  case  when  C.  >  0,   We  can  subdivide  the  set  of  u. 
c*sd  *  - 

into  e  -     subsets  such  teat  any  tro  tL  in  the  sait-  subset  are  adjacent. 

This  subdivision  partitionE  the  H  eode  words  for  the  long  ends  into 

e"-LJ  subsets,  giving  e^2-    -odes  for  each  of  which  the  preceding  argument 

will  apply 6    For  eac-h  of  these,  therefore,  the  probability  of  error  for 

the  short  cede  is  less  than  or  equal  to  p*?    multiplied  by  the  probability 

of  error  for  the  corresponding  tart  of  the  long  cede.    By  the  o ;-r?b±natoria2 

argument  used  in  connection  with  previous  results,  at  least  half  the  code 

rords  are  in  codes  of  at  least  half  average  size,  and  the  average  error 

for  these  code  words  is  not  greater  than  2p*f    ?  .  „    Heooa,  theve  crists 

among  these  a  cede  containing  at  least  ~  lie-"--1-   words  and 

with  nrcbability  of  error  F    sr  2  r>  ~    5  .  , 

eg         ^min  e.^ 

To  prove  xhe  corollary,  let  the  rats  and  the  reliability  of  the 
given  long  code  be  R_  and  E, ,  s: 

^-nTd10^  > 

Further,  let  the  rate  and  reliability  of  the  short  code  constructed  fron 
this  be  R  and  E„ 

R  >  =  log  M  ~  i  C^d  -  !  log,  -  (1  ♦x^  ~xCx  -  |  log  2 

E  *  I  ^  £l  +  I  lo6  P^in  +  H  ^  2  *  (1  *x)EI  +  x  log  p^  +|  log  2 

where  x  «•>  ~  .    If  new  v;e  consider  a  series  of  codes  With  increasing  n 
approaching  the       and  R^  of  a  point  on  the  curve,  then  the  last  terms 
above  approach  zero  and  the  E  and  R  of  the  corresponding  series  of  short 
codes  have  a  limit  suprenum  on  or  above  the  straight  line  defined  by  the 
equations c 

R2  -  (l+x)R1  ~xCx 

E2  "  (1  *  *  l02  Pmin  . 

This  straight  line  passes  through  the  point  log         )  and  the  point 

(R^?E2^°    The  ranS®  x  >  0  for  which  our  statement  is  true  corresponds 
to  points  to  the  right  of  (R^,  Lj )  on  the  straight  linec    To  the  left  of 


(RjjSL)  the  reliability  curve  mast  be  on  or  belc.v  thir.  straight  line,  for 
if  it  were  above  the  line,  say  at  (R,9E,),  we  could  use    tids  point  for 
the  (R2.s>^^)  and  obtain  a  higher  valve  by  the  construction  of  these  short 
codes  at  the  original  E-  rate* 

This  result,  it  nay  be  noted,  is  very  similar  to  Theorem       =  Taken 
together,  they  alios:  one  to  pass  tso  straight  lines  through  any  giver- 
point  on  the  reliability  cur  re,  and  assert  that  the  our  re  lies  vrithin 
one  acute  angle  to  the  left  of  the  given  point  and  within  the  opposite 
acute  angle  to  the  right „ 

a  consequence  of  this  construction  is  that  E3  regarded  as  a  function 
of  R,  is  continuous  at  least  for  C,  <  R  <  C  and  also  that  R,  regarded  as 
a  function  of  E,  is  continuous  at  least  for  0  <  P.  <z  R({L).    This  is  evident 
since,  for  any  point  inside  these  intervals,  the  straight  line  upper  and 
loir-sr  bounds  force  E(or  R)  to  approach  the  given  point  as  R  (or  E )  does  soc. 


Thooremi    If  wc  have  a  code  Kith  M  words.,  each  of  length  n  and  with 

probabllity  of  error  P  ,  we  can  construct  a  code  of  at  least  ~  U 

wards  of  length  n-d  and  with  a  probability  of  error  P  <  2  ?  .  where 

e  c 

A  is  the  number  of  distinct  input  letters, 

p£oof :    Subdivide  the  M  given  words  into  A    subsets  according  to  the 
first  d  letters o    The  first  subset  consists  of  all  the  code  words 
containing  the  first  input  letter  in  all  of  the  first  d  positions.  The 
second  subset  contains  the  first  letter  in  the  first  d-1  position;  and 
the  second  letter  in  its  d°h  position  and  so  on,  lexicographically  , 

At  least  half  of  the  original  words  nust  be  in  subsets  with  i  i=d  it 
or  mare  members,  for  the  total  number  of  words  in  not  more  than  A 
subsets  each  of  size  not  mere  that  ^  A**  M  is  less  than  or  equal  to 
f  uf  that  is>  less  than  half  the  total.    Hence  the  other  half  is  in  larger 
subsets,    Kow  consider  these  larger  subsets.    The  average  probability  of 
error  in  the  original  code  for  all  words  in  these  subsets  is  less  than  or 
equal  to  2  Pq,  since,  if  not,  the  average  probability  of  error  f or  all 
words  would  be  greater  than  Pq.    The  probability  of  error  for  these 
larger  subsets  is  a  weighted  average  of  the  probabilities  of  error  for 
the  individual  larger  subsets/  hence,  there  exists  an  individual  subnet 
with  a  probability  of  error  less  than  or  equal  to  2  p  .    if  these  words 
alone  are  used,  the  probability  of  error  can  only  be  improved,  and  if  the 
first  d  letters  are  deleted,  the  probability  of  error  is  unchanged. 

If  d  -  kn,  E  -  -  i  log  Pe  and  R  -  |  log  M,    Then  we  find  for  toe 
new  code,  as  n-»co, 

Rl~*T^k  <R  ~k  log  A) 
El^T^E 

This  means  that  on  the  E,R  plot,  if  a  straight  line  be  passed  through 
the  curve  at  E,R  and  through  the  point  E  «  0,  R  «  log  A,  then  the  E,R 
curve  lies  below  (or  on)  the  straight  line  to  the  right  of  the  given 
point  and  above  (or  on)  the  straight  line  to  the  left  of  the  given  point. 


A  Hesult  for  the  Memoryless  Feedback  Channel. 

Theorem:     Given  a  code  for  a  memoryless  feedback  channel, 

with  block  length  n,  probability  of  error  P  .  and  number  of 

e 

messages  M,  we  can  find  a  code  with  block  length  n-d,  probability 

of  error  <•  2  Po      and  number  of  messages  2  m/{sibv)  ,  where 
e  max 

a  is  the  number  of  letters  in  the  input  alphabet,  b  that  for  the 
output  alphabet,  Pmax  the  largest  transition  probability  and 
d  any  desired  integer  from  0  to  n. 

Proof:    For  the  given  code  consider  the  set  of  transmission 
"starts5'  of  length  d.     Input  letter  z,  say  is -received  as  y,  5 
next  3;,   is  transmitted  and  received  as  y2,  etc.  to  as  yr,^ 

and  finally  x^  as  y^.    There  are  (&%'}    possible  starts  (z-^y^, 
x2,y2,   ...  ,  x.,y.)  of  length  d.     In  the  given  code  let  :v^e 
occur  with  probabilities  q-j.,^*   •  • 0   '  ^  (w^ers  T  =  (ab)d  ). 
Let  the  final  probability  of  error  for  each  of  these  be  P  .. 
(i  =  1,2,  T).    Then  =  pe°    Using  our  combinational 

lemma  there  is  at  least  one  of  these  starts,        with  a 
qw  &         J,  =  1/2T  and  with  a  P  ^    2  P  .        Any  message  which 
can  cause  a  particular  start  (such  as  start  «.  )  leads  to  this 
start  with  probability  g  =  px    (y1)  px    (y2)  ...  Px  where 
the       and  y^^  are  those  for  tie  start.2  The  total  probability 
of  the  start  is  then  l/M  times  the  number  of  messages  that  can  cause 
the  start  times  this  product.    For  start  oc  this  total  probability 
is  ^  1/2T,  hence  the  number  of  messages  must  be  greater  than 
1/2?/  l/M  g    £    M/2T  pm^      =  M/2(ab  pmax)d. 

The  code  to  be  used  of  length  n-d  consists  of  the  messages 
in  the  group  ec ,  sending  only  the  last  n-d  letters  as  though  they 
had  started  in  the  manner  leading  to  start  <*  .    77e  have  seen  that 
the  number  available  is  as  stated  in  the  theorem.    If  the  detection 
system  used  is  that  for  the  original  code  and  all  received  signals 
not  decoded  as  one  of  the  messages  in  the  group  is  counted  as  an 
error,  then  the  probability  of  error  will  be  exactly  P    <  2  P  . 
A  suitable  distribution  of  these  other  received  words  can  only 
improve  this  value. 


rage  le 

Continuity  of  P       ,  as  a  function  of  transition  probabilities a 

"  — -      e    OPt   1 — — — •  —  '  «- 

Theorem:    The  probability  of  error  for  the  optimal  code  of  length  n  in 
the  channel  defined  by  p. (j),  that  is,  P         (p. (j),n),  is  a  continuous 
function  of  p±0)  in  the  region  R  defined  by  fTp^U)  e  l(i  =1,2,  a). 

Proof:    For  a  given  finite,  number  of  input  v/ords  and  a  finite  number  of 
output  v/ords  there  are  a  finite  nurnber  of  codes  containing  M  *.;ords0 
There  is  also  only  a  finite  number  of  decoding  systems  for  each  of  these 
codes,    hence  there  is  a  finite  number  of  complete  systems.    Let  these 
be  numbered  and  let  the  probability  of  error  for  the  ith  one  be 
Pei(i  -  1,2,         f).  Then 

Pe  opt(%(^n>  -  ■*»  pei(pi(j->n)  (« 

i 

Each  Pei  is  a  continuous  function  of  p^j,  in  the  region  R.    In  fact  each 
Pgi  is  a  multinomial  in  these  probabilities' t  namely,  if*  times  the  sum 
of  the  probabilities  of  each  code  word  being  carried  to  all  the  received 
v/ords  which  are  not  decoded  as  the  rord  in  question.    The  minimum  of  a 
finite  number  of  continuous  functions  is  a  continuous  function,  proving 
the  theorem.    In  fact,  re  may  say  more  strongly  that  P         is  made  up 
of  a  finite  number  of  multinomials,  each  representing  Pg  ^  in  a  region 
of  the  p±(j)  space o 


fese  If 

Codes  of  a  fixed  comnosition., 

Consider  words  of  length  n„    Suppose  an  input  word  has  X.  n  occurrences 

of  the  ith  letter  (i  *  1,2,  a)    Then  tre  will  call  the  vector  X±  the 

composition  of  the  wordo    Simularly,  if  an  output  word  has  ^.n  occurrences 

of  the  ith  letter  (i  ■  1,2,  . b),  then  p.  is  the  composition  of  this 

output  word*.    The  number  of  different  compositions  of  input  words  is 

(n^a)  £  n^,,    The  nunber  of  different  compositions  of  output  words  is 
n  +  Id  b 

similarly  (       )  <  n  .    ?»"e  may    consider,  for  a  given  channel,  codes  in 
which  we  artificially  restrict  the  input  words  to  a  particular 
composition,  say         V<e  can  then  consider  problems  of  finding  the 
optimal  code  and  its  optimal  probability  of  error  and  reliability c  This 
reliability  we  denote  by  E(R,h±,n),    We  will  nor;  show  the  following  : 

1  3  a 

-  log  2  +  ma::  E(R-±  log  2n  ,  K  ,n)  >  E(R.n)  >  wx  E(K,\.,n) 

The  right  hand  relation  is  clear  since  the  right  hand  member  is  the  best 
reliability  for  codes  all  of  whose  words  have  the  same  composition, 
while  E(R,n)  is  the  best  reliability  with  no  such  restriction  and  consequently 
is  at  least  as  good.    The  left  hand  relation  is  proved  as  follows.,    In  a 
cods  with  eRn  input  words  distributed  over  not  more  than  na  different 
compositions,  the  average  composition  has  at  least  e^/n*  input  words. 
Using  a  combinatorial  principle  previously  proved, 

at  least  half  of  the  words  are  in  composition  classes   which  contain  at 
least  half  this  average  number  of  words 0   V/hen  a  word  is  in  such  a  class, 
the  probability  of  error  is  at  least  as  great  as  if  there  were  no  other 
input  wards  (except  those  in  the  class)  and  the  code  was  the  best  possible 
for  the  number  in  the  class 0    The  probability  of  error  would  again  be 
reduced  if  the  composition  in  question  were  that  which  has  the  smallest 
probability  of  error  for  the  given  number  of  input  words 0  Translating 
into  reliability  and  rate,  the  reliability  for  the  cases  at  hand  is  not 
'greater  than  max  E(R=—  log  2n  ,  \,,n)„    Since  these  words  occur  at  least 
half  the  tine,  the  reliability  E(R,n)  for  the  original  code  satisfias 
the  left  inequality,, 


-K^ge  11 


Relation  of  R,  to  p 

The  or  am;    Suppose  a  particular  code  hc.s  e     ".rords  and  the  distribution 

function  for  the  infer  nation  I  is  p(x)  (the  words  being  used  with  equal 

probability)  0    Then  the  optimal  detection  system  for  this  code  gives  a 

probability  of  error  ?    satisfying  the  ineciualities 

e 

§  p(R  ~  ~  log  2)  c  ?e  <  p(R  -  i  log  2) 

Broof ;    We  first  prove  the  lower  bound *    By  definition  of  the  function 
P,  the  probability  =  p(R  -  -  log  2)  that 


cr 


-  log  —r=-^r-j—i'  <  ft  .  -  log  2 
n     °  p(ujp{v)  n  b 


_p(u,v^    ,1  rZ1 

iTulpTTT -2-'° 


or  (using  the  fact  that  P(u)  -  e"nE) 

New  fix  attention  on  these  pairs  (uav)  for  which  this  inequality  p  (u)  <  1/2 
is  true,  and  imagine  the  corre spending  (u,v)  lines  to  be  narked  in  black 
and  all  other  (u,v)  connecting  lines  marked  in  red.    fte  divide  the  v 
points  into  two  classes:    ^  consists  of  those  v's  which  are  decoded  into 
u»9  connected  by  a  red  line  (and  also  any  v's  which  are  decoded  into  u>s 

not  connected  to  the  v's);  a,  consists  of  v's  r/hich  are  decoded  into  u's 




connected  by  a  black  linee   We  have  established  that  with  probability 


p(R  =  ~  log  2)  ths  (u.v)  pair  trill  be  connected,  by  a  black  line*  The 

v's  involved  rill  fall  into  the  taeo  clarsos  C.  and  CU  with  nrcbability 

1  a 
P-j,  say,  and  Pg  =  p(R  n  ~  log  2)  <=  p«^e    Whenever  tee  v  is  in       an  error 

is  produced  since  the  actual  u  x?ac  one  connected  by  a  black  line  and  the 

decoding  is  along  a  red  line  (or  tc  a  disconnected  u) „    Thus  these  cases 

give  rise  to  a  probability  p-j^  of  error.    When  the  v  in  question  is  in 

class  C2,v;e  have  p  (u)  <  l/20    This  means  that  with  at  least  an  equal 

probability  these  v's  can  be  obtained  through  other  u»s  than  the  one  in 

question c    If  tee  sum  for  these  Vs  the  probabilities  of  all  pairs  p(u,«) 

except  that  corresponding  to  the  decoding  system,  then  ue  will  have  a 

probability  at  least  p2/2  and  all  of  these  cases  correspond  to  incorrect 

decoding.    In  total,  then,  r;e  have  a  probability  of  error  given  by 


Fe  >  P(R  -  ~  log  2 v- , 


We  nor;  prove  the  upper  bound c    Consider  the  decoding  system  defined 
as  follows  o    If  for  any  received  v  there  exists  a  u  such  that  pv(u)  >  ^, 
then  the  v  is  decoded  into  that  u<,    Obviously  there  cannot  be  more  than 
one  such  u  f oz  a  given  v  since  the  sum  .of  these  would  imply  a  probability 
greater  than  one.    If  there  is  no  such  u  for  a  given  v,  the  decoding  is 
irrelevant  to  cur  argument „   We  may,  far  example,  let  such  u's  all  be  decoded 
into  the  first  ward  in  the  input  code„    The  probability  of  error,  with  this 
decoding,  is  then  less  than  or  equal  to  the  probability  of  all  (u.v) 
pairs  for  which  py(u)  £  ^  . .  That  is, 

£  ]jjr  p(u,v)      (where  S  is  the  set  of  pairs  (u,v)  with  py(u)  £  ^) 

The  condition  py(u)  5  i  is  equivalent  to  *^u^  <  ^  ,  or,  again,  to 
pfffipjv)  -  I  P-^"1  "  I  ^    Ttils  i£  equivalent  to  tha  condition 

n  los  pfu7p~(v7  -  R  "  n  log  2e    rns  smi  Z  P(u»v)  "h£re  this  is  ^ue  is, 

by  definition,  the  distribution  function  of  i  log  -?fy*T\    evaluated  at 

1  n  p(u;p(v; 

R  -  -  log  2,  that  isfi 

Pe<  ]T  P(u,v)  -  p(R-  i  log  2)  o 


Bound  on  Pg  for  Random  Code  by  Simple  Threshold  Argument 

Theorem:     Suppose  some  p(u)  for  u  words  of  length  n  gives 
rise  to  a  distribution  ^(1).     Then  given  any  R  and  any£>0 
there  exists  a  selection  of  enR  input     werda      and  a  decoding 
system  such  that  if  these     rordc      are  used  with  equal  prob- 
ability, the  probability  of  error  Pg  is  bounded  by 

Pe  £  ,o  (E  *  e  )  +  1/2  e'n° 

Proof:     For  a  given  R  and      consider  the  pairs  (u,v)  of 
input  and  output  words  and  define  the  set  S  to  consist  of  these 
pairs  for  which  log    p  (u,y)      >  n(R  +  e).     Thinking  of  the  u's 
and  v's  as  two  sets  oP'u'p^      points  with  connecting  lines 
between ,  we  can  imagine  the  set  of  lines  corresponding  to  the 
set  S  to  be  colored  red.     When  the  u's  are  chosen  with  prob- 
abilities p(u),  then  the  probability  that  the  (u,v)  pair  will 
belong  to  the  set  S  is,  by  definition  of  ^>  ,  equal  to  1  -p  (R  +  q)0 

Now  consider  the  ensemble  of  signalling  codes  obtained 
in  the  following  manner.    The  integers  1,2,3,...,  S  =  e1^ 
are  associated  independently  with  the  different  possible  input 
sequences  Up  u2,  ufi  with  probabilities  pC^),  p(u2),  ... 

p(uB).    This  produces  an  ensemble  of  codes  each  using  M  (or  less) 
input  words.    If  there  are  B  different  input  words  i^,  there 
will  be  exactly  BM  different  codes  in  this  ensemble  corresponding 
to  the  BM  different  ways  we  can  associate  M  integers  with  B  input 
words.    These  codes  have  different  probabilities.    Thus  the 
(highly  degenerate)  code  in  which  all  integers  are  mapped  into 
input  word  ux  has  probability  p(u1)M.    A  code  in  whjch  dk  of  the 
integers  are  mapped  into  uk  has  probability p(uk)  k.    We  will 
be  concerned  with  an  average  probability  of  error  for  this 
ensemble  of  codes.    By  this  we  mean  the  average  probability 
of  error  when  these  codes  are  weighted  according  to  the 
probabilities  we  have  just  defined.    We  imagine  that  in  using 
one  of  these  cedes  each  integer  is  used  with  probability  l/M. 
Note  that  for  some  particular  selections,  several  integers  may  fall 
on  the  same  input  word.    This  input  word  is  then  used  with  higher 
probability  than  the  others. 


s2 


In  any  particular  code  of  the  ensemble,  our  decoding 
procedure  will  be  defined  as  follows.    IT  a  received  v  seouence 
has  no  red  line  coming  into  it  (for  this  v.  there  is  no  (u,v  ) 
pair  in  the  set  3)  then  we  decode  (conventionally)  as  message 
1-     If  there  is  exactly  one  integer  mapped  into  a  u  connected 
by  a  red  line  to  this  v.  ,  we  decode  as  the  corresponding 
integer.     If  there  is  more  than  one  such  integer,  we  decode 
as  the  smallest  such  integer. 

With  any  particular  code  in  this  ensemble  the  probability 
of  using  the  different  tt±  will  not,  in  general,  be  given  by 
P(ui).    however,  if  we  average  over  the  full  ensemble,  then  each 
Ui  W1±i  "S  USed  v;ith  the  Probability  p(u,),  since  integers  were 
nappea  invo  u.  in  constructing  the  ensemble  with  just  this 
probability.     This  means  that  in  the  ensemble  average,  a  pair 
(u,v,  will  j£so  occur  with  the  probability  p(u,v). 

Now  let  us  compute  the  average  probability  of  error  in 
this  full  ensemble  of  codes.     In  the  ensemble  a(u,v)  pair  will 
not  belong  to  the  set  S  with  the  probability  p  (R  +£)  We 
suppose,  pessimistically,  that  each  case  of  this  sort  produce* 
an  error.    The  remaining  1  ~  ^ (R  +  &  )  of  the  time,  the  (u,v) 
pair  aoes  belong  to  the  set  S  and  consequently 

10g    Mil]         >  n(R+^ 
Pv  (u)   >  p  (u)  en(R  +&) 

Fixing  v  at  v.,  say,  we  now  sum  this  inequality  over  all  „«e  such 
that  (u,v±)  belongs  to  the  set  S.    This  subset  of  u's  we  call  S 
Thus  we  obtain  i' 

£      Pv.(u)  >  g*<H.*«)     C  p(u) 

Now  the  left  member  is  clearly  less  than  or  equal  to  one,  it 
being  the  conditional  probability  that  v.  was  caused  bv  a 
member  of  S        The  sum  in  the  right  membir  *e  v;?n  denote 
y  vi*      I  Is  the  t,otal  unconditional  probability  for  all 
members  of  S±1  that  is,  for  all  u-s  connected  to  v±  by  red  lines. 


using  these  we  obtain 


Sow  cons i dor  the  conditional  probability  in  thti  ensemble 
of  codes  of  an  error  in  decoding  when  r<  is  received  and  the 
correct  message  is  oonnevted  to  this  v_:  by  t  red  line.  r1his 
probability  P     i«  siren  bj 


I 


£  N      r»-  a^K""K  k 


She  reason  for  this  is  that  conditional  or  ire  -iron  inf orB£.ti.cr: 
the  probability  t  ic  the  ensemble  ci 


resv.^t  be  1124" 

caused  by  one  with  en&etly  E  integers  coded  into  the  Q,  subset 
is 

In  the  case  of  such  a  code,  the  probability  of  error  in  decoding 
is      K-l     .      Multiplying  by  this  and  summing  on  K  gives  the 
Pei  exP'ression  above.    This  may  be  evaluated  easily  by  noting 
that  the  denominator  is  the  expectation  of  a  binomial  while  the 
numerator  is  this  same  expectation  less  £  fH\  qJ</      f  %  ,  m  /  .  ^ 
Hence,  we  have  K*i  VK^t  ^  v^ J       1    1  VP 


provided  e~n&  <  1,  since  then  MQ±  <  1  and  the  alternating 
binomial  expansion  is  decreasing  in  absolute  value  and  hence 
may  be  overestimated  by  dropping  the  terms  after  l/2(Ii-l)Q^. 

Now  since  P  .    ^  1/2  e~ne    for  each  i,  the  probability 
of  error  when  the  (u,v)  pair  belongs  to  set  S  is  less  than 
1/2  e"ne.    Hence,  the  unconditional  probability  of  error  over 
the  ensemble  of  codes  satisfies 

This  being  the  average  probability  cf  error  over  the  ensemble 
of  codes,  there  must  be  at  least  one  particular  code  in  the 
ensemble  with  a  probability  of  error  this  low.     This  proves 
the  theorem.    More  generally,  one  may  say  that  at  least 
half  the  codes  in  the  ensemble  have  a  probability  of  error 
less  than  twice  this  bound  and  at  least  a  fraction  6  have 
a  probability  of  error  less  than  -L-  times  this  bound. 


A  bound  on  P   for  e  random  code. 
«»-  —  jg  

Theorem:    Given  a  distribution  p(u)  for  input  nords  of  length  a  which 
produces  the-  information  distribution  p(:c),  then  the  •  random  enseals  of 
codes  v:ith  e   '  words  based  on  p(u)  has  an  avers; 3a  probability  of  error 
satisfying 


P  <  eRn 
e 


'oo 


~neRn     f  p(x)e^ch: 
J  R 


Hence  there  exist  particular  codes  with  e     nsafcers  and  this  probability 
of  error  „ 

Proof ;    Construct  the  random  ensemble  of  cafes,  each  cods  having  e  Rn 
cankers  and  based  on  the  given  input  distribution  p(u).    V.e  wish  to  cal- 
culate a  bound  for  the  average  probability  of  error  over  this  ensemble , 
In  the  ensemble,  pairs  (u,v)    of  transmitted  and  received  words  occur 
with  the  same  probabilities  as  in  the  original  situation  produced  05  giving 
the  input  words  probabilities  p(u)„    V.e  calculate  the  error  probability 
by  an  integration  on  the  variable  x  occurring  m  the  information  distri= 
but  ion  p(j:)0    The  probability  that  a(u,v)  pair  are  such  as  to  give  an  x 
lying  in  the  interval  x^xa^^s  p(x±  ^,)-p(x±).,   For  such  a 
(u,v)  pair 


i     n  10g  p(u)p(v)  S*i  +  1 


or 

p(u)eraci  <  pv(u)  <  p(u)eriXi  +  l 


If  we  sum  tho  left  terms  of  this  inequality  over  all  v:crd?  u  in  set  S?  say, 
with  greater  conditional  probability,  we  obtain 

e1*1  ^p(u)  <  ^p  (u)  s  1 
5  S 

Q  -  ^p(u)  <  e"^ 
S 

since  the  totalprobability  of  set  s  conditional  on  v  cannot  ^ceeed  1„ 
Our  detection  system  will  be  to  choose  among  the  possible  wards  in  a 


particular  code  when  v  is  received  that  one  for  vrhicb  p  (u)  (in  the  criri 
probability  system)  «es  greatest „    A  (u,v)  pair  in  tlie  interval  x..  ,z ^  +. 
Kill  be  safe  in  a  code  in  the  enseals  if  the  est  s  for  that  v  is  empty 
(apart  Iron:  the  particular  «  -.vhich  produced,  the  rair}c    In  the  ensemble, 
the  probability  of  err  ear  from  a  (u.v)  pair  acy  be  calculated  as  in  tho 
£imp3js  threshold  casec    We  obtain  an  upper  bound  £ro«a  argusest 

1  Rn  „  <  En.  r  5  n(E-r-i) 
»>-  e      «     <e  e  "  o 


live  nop  car.  overestimate  the  probability  cf  error  by  s  issuing  the 
probability  cf  the  (u,r)  ):air  being  in  the  interva'i  +  •  sauitiplied 

by  the  probability  of  nerds  in  the  set  £  for  such  <i  case ,    blatz  also  that 
our'  bound  for  the  latter  is  one  for  x  «  E$  for  SEm.ller  E'a  we  use  the  bou 
one  rather  than  e  v         '  .    &s  the  intervals  y..f       +  1  approach  zero 
length,  cur  bound  approaches  the  integral  fo.rn: 

pco 

P„  <  P(E)  *     •       en(R  ~^  dp(x) 
C  J  R 


Integrating  by  parts 


_  ,  Rn  ,  .  =xn 
P  S  e  p(>:je 


ne 


Fn 


r  co 


p(x)e""rac  dx 


R 


oo 


p(x>s"nx  'ix*  p(E) 


Rn 


p(x)  'ie 


~nx 


CO 


Corollary;    Under  the  Condi Dions  of  the  theorem,  suppose  a  maximum  for 


x  >  R  of  p(x)o'=3U1  occurs  at  x  «  Rffi  -  Then 

Pe  <  e<R"  Vn  p(Rm)  [log  ep^f1  +n<Rm-K)] 


In  particular,  if  R^  *•  R 

P  <  P(P-)  -°g  epir)"1 


Proof:    Using  the  second  formula  in  the  theorem,  ne  have 

oo 


P  £  ne 
e 


Rr, 


R 


e  *  '  p(x)  dx 
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The  maximum  of  the  integrand  by  the  conditions  of  the  corollary  is  e*  m'  p(r:m) „ 
Vte  also  have  an  upper  bound  for  the  integrand  e     ,  since  p(x)  <  1. 
These  tr/o  bounds  cross  at  x  -  a,  where  a  satisfies 

e^n  „  p(Rb) 

Replacing  the  integrand  by  e"^11  P(Rm)  f or  x  £  a  and  by  e""31  for  ;•:  >  a, 
we  obtain  the  upper  bound  for  PQ 

Pe  <  e(R  p(EB)  [log  epCRj"1  ♦  n(Rm  -R)J  . 

Setting  R    ■  B  gives  the  second  bound c 
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The  Feinstein  Bomd 

It  is  interesting  to  compare  those  results  with  the  bound  on  the 
probability  of  error  found  by  Feinstein  e    Using  a  different  method  of 
proving  the  coding  theoreis  for  a  noisy  channel,  he  found  the  following 
upper  bound  for  the  probability  of  error: 

V      L2              *  (51}  J  - u 

in  Hhish 

n  -  block  length  of  the  code 
C  -  channel  capacity 
R  «  -  log  (number  of  code  words) 

6*  can  be  taken  to  be  Frcb  ^[H(X/Y)i  log  p(u/v}|  >  a^J 

62  can  be  taken  to  be  prob  £  ?K(X)  *i  leg  p  (u)f  >  e^j 

In  using  the  values  above  for  6^  and  6g  we  are  using  the  most  favorable 
values  to  give  a  low  bound  on  P  0    The  bound  U  above  may  be  approximated 
within  a  factor  of  2  by  a  somewhat  simpler  expression  as  follows j 

r^jL  +6ij  -  u  --rrq  Le  *6i 

The  left  inequality  is  obtained  by  squaring  the  expression  for  U  and 

dropping  the  necessarily  positive  middle  term0    The  right  inequality 

2  2 

f  oHoets  from  noting  that  2AB  £  A  +  B    so  that  U  is  increased  by  deleting 
the  middle  term  and  doubling  the  squared  terms c 

The  bound  is  somewhat  simplified  in  the  case  where  p(u),  the  probability 
of  input  word  u  to  achieve  channel  capacity is  constant  at  2°*1^u^0  We 
then  have  6g  **  0  and  Sg  -  0o    This  situation  occurs,  for  example,  in 
channeXe  with  uniform  input  letters,  as  we  have  seen  previously,  and  in 
particular  in  the  binary  symmetric  channel,,    In  these  cases,  the  inequali- 
ties simplify  to 


-n(A-  aj 
<  U  £  2(2  i  4-  6*)  , 

where  we  define  A  -  C  -R  as  the  discrepancy  between  channel  capacity  and 
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rate  far  the  code©    Note  also  in  this  case  that 
.  6*  -  Jrob    rjH(X/l)  + |  log  (u/v)  j  > 

-  2rob    fSH(XA-)  -H(X)-|  log  (p(u,v)/p<u)p<v)|  >  6,j 
«  Ercto  [~  log  (p(u,v)/p(u)?(v))=  C  - 

-  p(C  -  6-  ) 

??hore  p  is  the  distribution  function  for  inf ormation  that  y;c  have  used 
previously c  Making  the  change  of  variable  ^  -  4-9,  the  inequalities 
far  U  become 

mm 

This  e?.v  be  compared  with  the  inequality  (       }  found  for  the  random  code 
by  the  simple  threshold  s©fct<3d.    It  will  be  seen  that  they  are  within  at 
TTorst  a  factor  of  2  of  each  other c    Sineo  the  bound  (       )  leads  in  the 
binary  symmetric  channel  to  a  reliability  bound  considerably  poorer  than 
the  true  reliability  curve,  the  same  cay  be  said  of  the  Feinstoin  bound „ 
fte  have  made  no  approximations  in  estimating  the  reliability  bound  from 
the  inequality  obtained  by  Foinstein.    It  follows  that  either  the  type 
of  code    (or j  more  precisely,  the  poorest  code  that  can  be  constructed  by 
his  method)  is  considerable  poorer  in  reliability  than  the  random  code  or 
else  that  the  bound  (       )  is  a  relatively  poor  estimate  of  the  error 
probability  of  these  codes  (that  is,  that  approximations  made  prior  to 
this  formula  rrere  sufficiently  crude  as  to  cause  this  difference  in  the 
reliability  bounds)  „   Which  of  these  is  actually  the  case  sre  have  not 
determine do 
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Relations  Between  Reliability  and  Minimum  Kor-:  Sfparation 

In  this  section  "re  prove  some  results  relating  probability  of  error 
with  the  n&B&mm  separation  between  words  in  the  coct-    These  results 
show  that  ishen  the  signalling  rate  R  is  very  small  the  reliability  is 
approximate].;;  the  minimum  separation     As  a  consequence,  to  obtain  a  good 
code  for  R  near  zero,  the  essential  feature  is  tc  choose  a  set  of  code 
words  such  that  the  ni  V-ivm  saparstion  between  any  pair  is  as  large  as 
possible f 

Theorem;    For  any  code  with  rate  R  and  maximum  likelihood  detection 

^min  "  ?<>  I  lo£  pe  ^  4»in  *  H  m  i  loS  2 
where  A  ^  is  ths  ainimun  separation     between  Trcrds  of  the  cede.  Hence, 
for  any  code  sequence  irith  rate  approaching  zero  and  maximum  likelihood 
detection,  the  reliability  approaches  the  minimum  separation. 


Cor;  llary;    K(0  ;  -  iim  'im  max  ^ein 

i^  cedes  of  rate  R 

Proof  {    Let  tiTO  *or\s  at  minimum  distance  be       and  '.T^.    The  probability 

of  error  for  the  code  is  certainly  at  least  |  times  the  probability  of  error 

when  Wx  or  1?2  is  used,  sine  \  of  the  time  one  or  the  other  of  these  will 

occur.    This  latter  probability  is  certainly  at  least  what  it  would  be  if 

none  of  the  other  words  (except  7^  and  V^)  were  present,  and  the  detection 

were  by  maximum  likelihood.    Tais  last  is  e~  n.  Thus 

p     2    -«min  a 

Taking  the  loga-ithm  and  dividing  by  n,  *e  obtain  the  upper  bound. 

he  lower  bound  is  obtained  by  noting  that  the  probability  of  error 
when  a  particular  word  is  transmitted  can  be  calculated  by  s -jamming  the 
probabilities  of  being  interpreted  as  each  other  word.    These  terms  are 
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o-erestimated  by  fcafcicr  each  other  were  fee  be  at  separation  -<d  ana 

adding  these  contributions  d.isTiiicti?ely.    -his  amounts  to  adding  M(U  !-  0  /2 

contributions  (one  f-.r  each  pair  of  uta^)  aad  gi^k^each  th©  value  just 

2       *■  r     v  'T 
sfotainad    /LI  e for  the  worst  pair,  thus 

_  >  2    M(L<  -  1)       -  n<£  Ein      ^       «n  -^ndn 

By  taking  logarithm  and  dividing  by  n  wo  obtain  the  desired  result. 
Singe  .'or  F.-^O  the  tr.?o  bounds  converge  to  ^3  .   ,  the  second  statement 
of  the  theorem  is  true    The  corollary  results  on  combining  the  theorem  with 
ths  definition  of  the  reliability  function  E. 

Corollary!  Let/|. .  (h,n"  be  the  EinjUauE  separation  between  words  in 
the  code  of  rate  R.  block  length  n,  which  maximizes  this  minimum  distance 

for  a  giver  channel.    Then  the  reliability  characteristic  E(R)  for  the 

i 

channel  satisfies 

EE      ,2*in  -  R-^E(h)  <  HE     A^n  (R,n)  *  R 

n  -^»oc        '  cro 

Proof :  For  the  right  inequality,  note  that  for  any  sequence  of  codes 
of  increasing  block  length  n  the  £        (S»n)^*       (R,n)  (since^.  is 
ihc  largest  possible  £)  ,    for  the  given  R  and  n) „    Hence  for  sufficiently 
large  n,  all  £  „.,r  in  the  sequence  are  less  than  lim  ifly^  +  e  (for  any 
positive  c) -     Nor,  using  the  theorem  (  and  noting  that  i  log  2  -?0),  we 
obtain  E  4  Lim  K  *  e,    Tnis  toing  true  for  any  positive  e,  it  is 

true  for  c  -  0, 

The  left  inequality  alec  follows  easily  from  the  theorem„    Take  a  sub- 

■31 

sequence  from  the  sequence  of  codes  giving  ^  t  which  actually  approaches 
Lim"  £f  ain.  Applying  the  lower  bound  of  the  previous  theorem  to  this  sub- 
sequence of  coder;,  wc  obtain  i.he  left  inequality  above 


62 


Our  next  r*3ult  "hews  that  by  selecting  our  codes  the  R  in  the  upper 
bounds  of  these  results  can  be  eliminated. 

Theoremi    Given  a  code  sequence  approaching  rate-  R  and  reliability  E, 
there  exists  an  exp.irgated  sub-sequence  approaching  the  same  rate  R  and 

reliabilty  E  and  with  E  $  Lim  ^  ^hers  ^min^n'  "  th~  B^nSmtta 

n  —^oz> 

separation  between  words  in  the  nth  code  in  the  expurgated  su"-* -sequence . 

Proof;    For  any  givsn  A  perform  the  following  operation.    Delete,  in 

each  code  of  the  given  sequence,  ens  of  the  points  which  has  c  nearest 

neighbor  (provided  this  separation  is  less  than  or  equal  to  £.  )«  Next. 

delete  one  of  the  points  in  the  rose  ting  code  which  are  ciorost  together, 

and  so  on  up  to  the  point  at  which  no  points  remain  with  a  separation  le.°3 

than  or  equal  to  A*    This  is  done  for  all  the  codes  in  the  sequence,    *  or 

each^,  either  there  exists  an  e^O  for  which  an  infinite  sub-sequence  of 

the  codes  remaining  have  a  fraction  at  least  6  of  the  original  points  left 

or  such  an  e  does  not  exist.    This.-  dJ\i-le:      values  of  ^into  two  jedekinl 

classes  and  gives  a  minimum  divisicn  point  AQ  such  that  forZ^A  the  £. 

exists  and  for  A?dQ  it  does  not* 

ChoDPe  u^iy  small  interval  6>  0  aid  consider  tha  code  sequence  resulting 

for-^"^  -  6.    The  rate  for  this  sequence  is  at  least  R  ■»■  —  log  e  and 

hence  approaches  R  as  n-^00c    Furtherrcore ,  alaost  all  points  in  th^ nodes  r«smain= 

ing  in  the  sequence  have  a  neighbor  in  the  interval^  -  6  to        .  by  the 

construction  of        .    Finally,  the  E  for  these  codes  must  be  the  sans  as 
o 

the  original  E  since  errors  due  to  points  retained  are  only  increased  at 
most  in  the  ratio  -i,  due  to  increased  usago  of  these  points „    This  will  net 
affect  Eo      his  cede  sequence  is  then  ideal  and  close  to  uniform  in  nearest 


neighbor  separation.,    Almost  all  points  have  a  nearest  neighbor  betf.reer< 
^   -  6  and  ^^.and  5  is  arbitrarily  sne.ll,    -he  argument  about  Pg  given  in 
the  preceding  t$  eoreia  can  now  be  improved  since  almost  all  points  have  such 
a  near  neighbor,    Thus  we  get  the  inequality  trithcut  the  R  term 


for  any  6>  0,  and  h<;~r.e  tt&  can  obtain  a  sub-sequence  fcr  which  E$IdE*c  (a 


as  stated  in  the  theorem* 
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Inequalities  for  Bsccdable  Codes 

Consider  codes  cf  the  following  sort.    There  are  a  basic  letters 
and  s  wards  W^,  Wg.  cOOJ,  Wg  farmed "nf  sequences  of  the  letters c  These 
words  have  length^,  &s  (not  necessarily  equal) 0    The  code  x 

is  supposed  to  be  deeodaMe^  by  which  we  nean  th?.t  any  finite  sequence  of 
letters  can  be  broken  dorm  into  words  in  at  most  one  way„ 

Theorem:   Far  such  a  deeodabls  code  we  have 

cr-     <  ,  , 

^   a      <  1  (!) 

and 

Z    P/j.  >  ~Zp±  loga  P,  "  (2) 

where  the  pi  are  any  set  of  non-negative  numbers  Euch  that§~  pi  »  1„ 

Proof ;    The  two  inequalities  are  proved  in  very  similar  fashion.   We  prove 
(2)  first  n    Choose  a  set  of  rational  numbers  q.  whose  sue:  is  one  and 
which  are  close  appr  estimations  to  the  p^,  so  close  that 

and  (3) 
0.A  log  ^T1  -2  P±  log  P^l  <  e  „ 

This  is  possible  far  any  e  >  0,  sirx;e  both  £         and  £  q±  log  q"1  are 
continuous  functions  of  the  q^  in  the  range  of  allowed  values 0  Now 
consider  all  sequences  of  words  which  contain  exactly  mq^  occurrences  of 
ward        mq^  occurrences  of  ward  7/2j  etc0    Here,  m  is  any  multiple  of 
the  least  common  denominator  cf  the  q^o    All  of  these  sequences  contain 
exactly  m  words  and  are  of  length  exactly  £  mq^o    The  number  of  these 
seqvencos  is  at  least 

s  »   

 m8    e  ^i* 


This  total  number  of  sequences  must  be  less  than  or  equal  :.o  a  • 
since  this  is  the  total  number  of  possible  sequences  of  the  length  in 
question  and  each  of  the  sequences  we  have  constructed  must  to  different 
for  unique  decoding,,  Thus 


Fc^e  2n 


Taking  logarithms  to  the  base  a  and  dividing  by  r% 
Using  (3) 

Z  Pi4  >  -Z  Pi  loSa  pi  "  3H  "  i  loga  V^q-2£  o 

Since  e  is  arbitrarily  snail  and  m  can  bo  arbitrarily  large,  we  must  have 
the  desired  relation  (2): 

Z  PiA  2  ~Zp±  1o%  ?i  • 

The  inequality  (1)  is  proved  as  follows 0  Let  pi  -  Aa  where  A 
is  chosen  so  that^  A  a~^i  *«  !«.  Choose  a  set  of  rational  q^  sunning 
to  one  and  approximating  to  the  p^,  in  the  sense  that 

|ZpA-Z«AI«  6 

|Z  Pi  losa  pI1  "Z*^  loSa  ^|  <  6  • 

Choose  an  integer  m  such  that  the  q^m  are  all  integers  and  consider 
sequences  containing  exactly  q^m  occurrences  of  word  W^o    Thus  there  are 
m  words  in  each  sequence,  and  their  length  is  5^  q^m/.  „    The  total  number 
of  sequences  we  construct  is  less  than  or  equal  to  the  total  nustoer 
available,  since  the  unique  clecodability  makes  them  all  different,,  Kence, 

q^m^  q^ml  OBO  qgm2 

Using  the  lower  bound  on  the  multinomial  coefficient  as  before  and  taking 
logarithms  to  the  ba.3e  a,  we  arrive  at 

Z  Vi  -         %  loga       '  &  •  i  l0ga  -y2ram?ii  • 
Exactly  the  same  arguEent  as  before  leads  to 

Z  p±4*  -Zpa  lo8a  Pi 


Pfcge  3n 

ar^  replacing  id.  by  its  value  As  "", 

-£  ■=/, 
>_  Aa  >  -TAa   1  logn  Ac    1  *  Z  Aa    1  ^ .  -  los  aZ-As 

X  f,  X  3, 

0>-  log,,  A 

-J<  -1 

A  «  (Z  a  x)    M  . 
This  is  the  desired  re  stilt  (l)e 
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Convexity  of  Channel  Capacity  as  a  Function  of  Transition  Probabilities 

Theorem;    The  channel  capacity  foe  transition  probabilities  p^U)  is  a 
convex  downward  function  of  these  probabilities.   That  is,  the  capacity  ~ 
C  for  the  probabilities  r^j)  -  |(p±U)  satisfies  the  inequality.  : 

where      is  the  "capacity  with  probabilities  p^)  and  Cg  that  with  pro- 
babilities 9^(0)0 

Proof;    Let  the  capacity  of  the  r±(J)  channel  be  achieved  by  the  input 
probabilities  Now  consider  the  following  channelo    There  are  as 

many  inputs  as  in  the  given  channels  but  twice  as  many  outputs,  a  set  j 
and  a  sat  ji   Each  input  has  transitions  J  P^U)        £  )•  Itaw» 

this  is  the  channel  we  T?ould  obtain  by  halving  all  probabilities  in  the 
p  (j)  and  the.  9^.(4)  channels  and  identifying  the  corresponding  inputs 
but  leaving  the  outputs  distinct.   We  note  that  if  the  corresponding 
outputs  are  identified,  the  channel  reduces  to  the  r±(3)  channelo  We 
note  also  that  without  this  identification  the  channel  looks  like  one 
which  half  the  time  acts  like  the  pi(j)  channel  and  half  the  tiEB  the 
q^j)  channelo    An  identification  of  certain  outputs  always  reduces 
(or  leaves  equal)  rate  of  transmission.    Let  this  channel  be  used  with 
probabilities    P.  for  the  input  symbols.    Then  this  Inequality  in  rates 
may  be  written 

H(x)  -  (§  Hy1U")  *  I  Hy2(r))  >  H(x)  -  Hy(x)  -  C 

where  Hy^x)  is  the  conditional  entropy  of  x  when  y  is  in  the  3  group  and 
Hy2(x)  that  when  y  is  in  the  J '  group.    Splitting  H(x)  into  two  parts  to 
coefcine  with  the  Hy^x)  and  Hy2(x),  we  obtain 

IvIm  c 

where  1^  is  the  rate  for  the  p^j)  channel  when  the  inputs  have  probabi- 
lities Pi  and       is  the  similar  quantity  for  the  q^j)  channel.  These 
rates,  of  course,  are  less,  respectively,  than       or  C2,  since  the 
capacities  are  the  maximum  possible  rates.    Hence  we  get 


A  Geometric  Interpretation  of  Channel  Capacity 

The  calculations  involved  in  determining  the  rate  R  and  channel 
capacity  C  for  a  discrete  memory!?^-  channel  can  be"  given  an  inter3Sting 
geometric  formulation  that  leads  to  some  insights  into  the  properties  of  these 
quantities . 

Let  a  channel  be  defined  by  the  matrix  Up^)!)   of  transition 
probabilities  fron  input  letter  i  to  output  letter  j  (i  -  1,  2,  -       a*  3  -  lf2, 
We  can  think  of  each  row  of  this  matrix  as  defining  a  vector  or  a  point  in 
at  -  1  dimensional  simplex  (the  b  -  1  dimensional  analog  of  triangle, 
tetrahedron,  etc.).    The  coordinates  of  the  point  sum  to  one,         V±(.j)  "  1, 
and  they  are  known  as  barycentric  coordinates.    They  correspond,  for  "xample, 
to  the  coordinates  a  chemist  uses  when  he  describes  an  alloy  in  terms  of  the 
fractions  of  various  components  and  chemists  often  plot  properties  of  alloys 
in  a  simplex  of  one,  two  or  three  dimensions  (lino  segment,  triangle,  or 
tetrahedron) . 

"We  thus  associate  a  point  or  vector  K  with  input  i.    Its  components 
arc  equal  to  the  probabilities  of  various  output  letters  if  only  this  input 
were  used.    If  all  the  inputs  are  used,  with  probability  P^^  for  input  i, 
the  probabilities  of  the  output  letters  are  given  by  the  components  of  the 
vector  sum 

Q  is  a  vector  or  point  in  the  simplex  corresponding  to  the  output  letter 
probabilities.    Its  jth  component  is   2-  P^j)* 


Now,  for  notational  convenience,  we  define  the  entropy cf  1  point  or  a 
vector  in  a  sinplex  to  be  that  of  the  barycentric  coordinates  of  the  point 
interpreted  as  probabilities .    Thus  we  write 

H(A  )  -  -  *5  p±(j)  log  PjU)  i  -  1,  2,  -  -  a 

H (Q)    *  -  P±  P±(j)  -ogSPi  P±(j) 

j  1 

-  entropy  of  recoived  distribution, 
In  this  notation t  the  race  of  transmission  R  for  a  given  sat  of  input 
probabilities       is  given  by 

R  *  H^>±A  -S'p^iUJ 


-  K(Q)  -  ^.Pj  H(A.) 
The  fu-iction  R(Q)  where  Q  is  a  point  in  the  sicplex  is  a  con  ex 
upward  function.    Tor  if  the  the  ccnponents  of  Q  are  x^  we  have 
H  -  -  *$k,  log  x. 


|§j»-  (1  -log,,) 

^!L__  A  I  1  *■  3 

«  jyj  W±  1  ■ j 

Hence         H.  ZJx.  ^dx .  =  -">"  — •  ''iO*)2    is  a  negative  def  inite ' f o: a,  This 

ij  *.  i 

is  true  in  the  space  of  all  nor.-ncgative  x.^  and,  !v?r«.e,  certainly  in  the  sobs 
space  where  %,x±  »  1,    It  follows  that  the  rate  R  above  is  always  Lon- 
negatlve  and,  indeed,  since  H  is  strictly  convex  (m  flat  regions),  that  R 
is  positive  unless  IE  P.         "  A£  whenever  Pg  i  0. 

^he  process  of  calculating  R  can  be  visualised  readily  in  tlr  cases 
of  two  or  three  output  letters,    '.'ith  these  output  letters,  imagine  an 
equilateral  triangle  on  the  floor  for  the  siicplex  containing  the  prints 


and  Q.    Above  this  triangle  is  a  rounded  done  like  the  ittesge  Auditorium* 
The  height  of  the  done  at  any  point  A  is  H(A).    If  there  were  three  input 
letters  with  corresponding  vectors  A^,  A^,  A^  these  correspond  to  three 
points  in  the  triangle  and,  straight  up  frcm  these,  to  three  points  on  the 
dors.    Any  received  vector  Q  *2  Pi  i±  is  a  point  within  the  triable  on  the 
flcor  defined  by  A.^       Ay    H(Q)  is  the  height  of  the  dome  above  the  Q  point 
anc^  J>±  H(A,)  is  the  height  above  Q  of  the  plane  defined  by  the  three  dona 
points  over  A^,  Ag,  Ay    In  other  words,  R  is  the  vertical  distance  over  Q 
from  the  don.  down  to  the  pla::s  defined  by  these  three  points. 

The  capacity  C  is  the  maximum  R.    Consequently  in  this  particular 
case  it  is  the  maximum  vertical  distance  from  the  dome  to  the  plar.3.  This 
cl*  ~~ly  occiars  at  the  point  of  tangency  of  a  plane  tangent  to  the  dome  and 
parallel  to  the  plane  defined  by  the  input  letters. 

If  there  were  four  input  letters ,  they  would  define  a  triangle  or 
a  quadrilateral  on  the  floe    depending  on  their  positions,  and  their  vertical 
points  in  the  done  would  in  general  define  a  tetrahedron0    Using  them  with 
different  probabilities  would  give  any  point  in  the  tetrahedron  as  the  sub- 
tracted value  )jg> H^).    Clearly,  the  maximum  R  would  occur  by  choosing 
prtbabilitijs  which  place  this  subtracted  part  on  the  lower  surface  of  the 
tetrahedron. 

These  remarks  also  apply  if  there  are  still  more  input  letters.  If 
there  are  a  input  letters  they  define  an  a-gon  or  less  in  the  flocr  and  the 
vertically  overhead  points  in  the  dome  produce  a  polyhedron.    Any  point  in 
the  convex  hull  of  the  points  obtained  in  the  dome  can  be  reached  Tith 
suitable  choice  of  the  P.  and  corresponds  to  some  subtracted  term  in  R. 


h  X 

3t  is  clear  that  to  maxaiz-ze  R  and  thus  cb'cain  C  cm  ased  only  consider 
the  lower  surface  of  this  convex  hull. 

It  is  else  clear  geometrically ^  from  the  fact  that  the  lower  surface 
of  the  polyhedron  is  convex  downward  and  the  dome  is  strictly  convex  upward*, 
that  there  is  a  unique  point  at  which  the  maximum  Rr  that  is  C.  occurs-.  For 
if  there  were  two  such  points,  the  point  halfway  between  would  be  even 
better  since  the  dome  would  go  up  above  the  line  connecting  the  points  at 
the  top  and  would  be  at  least  as  low  at  the  bottom  surface.    The  rate  R  is 
thus  a  strictly  convex  function  of  the  received  vector  Qfi 

It  is  also  true  that  the  rate  R  is  a  convex  upward  function  of  the 
input  probability  vector  (with  a  barycentric  coordinates  Pp         ~~  ?a 
rather  then  the  b  coordinates  of  our  other  vectors) „    This  is  true  since 
the  Q  vectors  Q  and  Q«  "orresponding  to  the  input  probabilities  ?±  and 
are  given  by 

The  Q  corresponding  to  o<       +  <XF^  (whereof  +3-1  and  both  are  positive)  is 

+  ?Q'  and  consequently  the  corresponding  R£     R  +  ;3R',  the  desired  resultc 
The  equality  can  occur  when  Q  -  0 ' >  so  we  cannot  say  in  this  case  a  strictly 

convex  function. 

These,  last  remarks  also  imply  that  the  set  S  of  P^  vectors  which  maximize 

a 

the  rat?  at  the  capacity  C  form  a  convex  set  in  its /dimensional  simple:-.  If 
the  maximum  is  obtained  at  twe  different  points  it  is  also  attained  at  all 
points  on  the  line  sag. rent  joining  these  points „    Furthermore,  any  local 
maximum  of  F.  is  the  absolute  maximum  C,  for  if  not,  .'win  the  points  corres- 
ponding to  the  local  maximum  and  the  absolute  maximum,,    The  value  of  R  must  lie 


on  or  above  this  liza  by  the  convexity  property,  but  must  lie  below  it  when 
sufficiently  close  to  the  local  maximum  to  make  it  a  local  maximum,,  This 
contradiction  proves  our  statement. 

Another  property  we  say  deduce  is  that  the  capacity  C  can  alvays  be 
attained  using  not  more  than  b  of  the  input  letters.    "..Is  is  becfc^se  any 
point  on  the  surface  of  a  b-dimensional  polyhedron  is  interior  to  some  face. 
This  face  may  be  subdivided  into  b  -  1  dimensional  simplcxes  (if  '  i.  is  not 
already  a  sinplex).    "he  point  is  then  interior  to  cne  of  these „    The  ver- 
tices of  the  simplex  are  b  input  letters,  and  the  desired  point  can  be 
expressed  in  terms  of  these . 

This  picture  gives  considerable  information  concerning  which  input 
letters  s hould  bo  used  to  achieve  channel  capacity.    If  the  vector  A^,  say, 
corresponding  to  input  letter  t,  is  interior  to  the  convex  hull  of  the 
remaining  letters,  it  need  not  be  used.    Thus,  suppose  A_t  "^r^C^SU^) 
where^.'*'-  1,°<  ^  0.    Then  by  the  convexity  properties  H(A_t)  H(A.,). 
If  by  using  the  A^  with  probabilities  P^^  we  obtain  a  rate  R  -  h(£p^  ki) 
-^Pi  HfA^),  then  a  rate  greater  than  or  equal  to  R  can  be  obtainsd  by  exprea, 

ing  A^  in  terms  of  the  other  A^,  for  this  leaves  unaltered  the  first  term 
of  R  and  decreases  or  leaves  constant  the  sum. 

In  the  case  of  only  two  output  letters  the  situation  is  extremely  sim- 
ple.   Whatever  the  number  of  input  letters,  only  two  of  them  need  '«*3  used  to 
achieve  channel  capacity.    These  two  will  be  those  with  the  maximum  and  mini- 
mum  transition  probabilities  to  one  of  the  output  letters.    These  values,  P^^ 
and  Pp.s^v,  are  then  located  in  the  cne-dimensional  simplex,  a  lins  segment 
of  unit  length,  and  projected  upward  to  the  H-curvc  as  shown  in  Pij.  £•  The 
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secant  line  is  drawn  and  the  capacity  is  the  largest  vertical  distance  .•from 
the  secant  to  the  curve.  The  probabilities  to  achievr  this  capac.ty  «re  in 
proporti  -  io  the  distances  from  this  point  to  tha  two  ends  of  -Jr.-  se  onto 

In  the  case  of  three  output  letters,  the  posit Jons  of  all  vectors 
corresponding  to  input  letters  may  be  plotted  in  an  equilr^ral  triangle c 
The  circumscribing  polygon  (convex  hull)  of  these  points  any  ctv  1  z  t'ker 
and  any  poii  ts  interior  to  this  polygon  (including  those  on  edges    may  be 
deleted.    W.  it  is  desired  is  the  lower  surface  of  the  polyhedron  "etermired 
by  the  poix.'s  in  the  K-surface  above  thesa  points.    This  "ower  su'faee,  in 
general,  will  consist  of  triangles  and  the  problem  is  to  deter mir.  which 
vertices  arc  connected  by  edg23„    A  method  of  doing  this  is  to  consider  a 
line  joining  a  pair  of  vertices  and  then  to  calculate  £ov  other  i  .nes  whose 
projections  on  the  floor  crc  '3  this  lino,  whether  they  are  above  it  or  below 
it  in  space  c.    If  there  is  no  lino  below  the  first  line,  this  line  is  an 
edge  on  t«e  lower  surface  of  the  polyhedron.    If  a  second  line  is  found 
below  the  first  line  this  one  may  be  tested  in  a  simjlar  fa~hion,  and  even- 
tually an  edge  is  isolated.    This  edge  divides  the  projection  int  >  two  scalier 
polygons  and  these  may  now  be  studied  individually  by  the  same  n&ans.  Even- 
tually, the  original  polygon  will  be  divided  by  edges  into  a  sat  of  poly- 
gons corresponding  to  faces  of  the  polyhedron,    kaeh  of  these  polygons  may 
then  be  examined  to  determine  whether  or  not  the  point  of  tangrnc;'  of  the. 
parallel  p±Lm  which  is  tangent  to  the  H-surface  lies  over  the  polyhedron. 
This  will  happen  in  exactly  one  of  the  polygons  and  corresponds  t.;  the  Q  for 
maxiraum  R, 
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Log  Moment.  Generating  Function  for  the  Square  of  a  Gaussian  Varlate 
Supnose  x  is  a  gaussian  random  variable  with  varianceY"2.  Its 

density  function  is  2 

1  _x  

p(x)dx   e  -,-2  dx. 

VST"" 
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The  random  variable  u  »  x   will  have  a  density  distribution  q(u)  obtained 
by  substituting  x  «Vu  ,  dx  •  du/        and  then  multiplying  the  result  by 
2„    This  last  operation  takes  account  of  the  two  halves  of  the  original 
distribution  which  both  go  into  the  positive  u  range .    The  result  of  these 
substitutions  is 

\     1  -  tt 

q(u)  da  "f-    e   lr?  u>0 

The  moment  generating  function  $(s)  is  calculated  as  fcllorrsr 
*(e)  e-*     q(u)  d-u 


•(a-  57)u  ±  - 
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Vl  ^sT? 
In  the  third  expression  we  make  the  substitution  |  -  -  s}^   1  he  integral 

in  the  third  line  is  recognized  as  integrating  to  1,  being,  in  fact,  a 
special  case  of  the  density  function  q(u)  above.    Notice  that  the  integral 
and  hence  -v(s)  exist  only  when  s4»-^rt 


The  log  of  the  moment  generating  function  and  other  useful  functions 
can  now  be  calculated.   We  have 

u-(s)  -  log  *(s)  -  -  \  log  (1  -  2a*2) 


u-(s)  -  su'(s)  -  -  i  log  (1  -  2sT*)  -  £^f 

2  1  - 


-  2S-Y2 


n(s)  -  (s  +  l)n'(s)  -  -  i  log  (1  -  2s^2)  -  (S  -  ^ 

1  -  2s"V 


u"(s)  -     ^     *  g  -  2(n')2 
(1  -  2sf)Z 
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Upper  Bound  on  P   for  Gaussian  Channel  by  Expurgated  Random  Code 

In  the  gaussian  channel  with  average  power  limitation  we  assume  code 
words  chosen  at  random  in  a  sphere  of  radius  /P.    If  the  number  of  dimen- 
sions n  is  large  enough,  the  fraction  of  points  at  a  radius  between 
(1  -  6)  Jo  and  /?  will  be  greater  than  1  -  £  for  any  positive  c  and  5» 

We  wish  to  calculate  the  rate  R  -  -  leg  M  f or  a  random  code  such  that  the 

n 

expected  number  of  code  points  within  E  of  a  given  code  point  is  less  than 
0-  9qual  to  one-half.    In  the  figure 


0  is  the  crigin,  X  is  a  code  word  at  radius  -/P.  The  sphere  of  radius  D 
centered  «>n  X  intersects  the  original  sphere  of  radius  /F  in  an  (n  -  1) 
sphere  whose  intersection  with  the  plane  of  our  drawing  consists  of  the  points  Y  and 

Z„    All  points  interior  to  both  spheres  are  included  in  the  ehpere  of 
length  OX  and  radius  ■/!  (in  n  dimensions).    Hence,  the  volume  common  to 
the  two  spheres  is  less  than  or  equal  to  the  volume  of  this  sphere     ,  which 

is  K  /A  where  K    is  the  coefficient  of  r"  in  the  formula 

n  n 

for  the  volume  of  an  n-sphere.    The  total  volume  of  the  /P  sphere  is  Kn  (J?)11' 
If  there  are  e     points  chosen  at  random  in  the  /P  sphere,  the  expected 
number  within  distance  D  of  one  of  the  points,  such  as  X,  will  be  less  than 

n 

ens  h  </*> 
e   


Kn  (/P)n 


Now  if  R  -  logjj  -  ^,  this  expected  number  approaches  zero  as  n-?ao 

for  any  6^0.  If  the  point  X  is  not  on  the  surface  of 
the  yff  sphere  but  at  a  slightly  smaller  radius,  VP  -  e,  the  radius  of  the 
sphere  >  VA>  is  slightly  larger,  VS  +  b^.  However,  by  caking  c  approach 
zero,  62  approaches  zero  and  its  effect  may  be  absorbed  in  Thus,  if 

in  our  original  sphere  with  points  distributed  at  random  we  first  eliminate 
all  points  except  those  within  c  of  the  sr/v-fice,  the  expected  number  within 
D  of  one  of  the  points  will  approach  zero  as  n—^ao  provided  the  rate  R  »  lo^j 
and  e  is  sufficiently  small.    By  eliminating  those  points  which  have  neigh- 
bcrs  within  D  we  can  still  obtain  a  rate  R  as  close  as  we  wish  'to^|'.  Now,, 
since  in  the  remaining  expurgated  codn  no  point  has  a  neighbor  closer  than 
D,  the  probability  of  error  may  be  calculated  by  our  theorem  on  minimum 
separations.    It  wii:  be  less  than  enR  tirr^s  the  probability  of  noise  carry- 
ing a  point  a  distance ?*d/2  or  more.    The  distance  d/2  can  be  related  to 
Ya  by  the  cbvious  trigonometric  eauation        «w  «  /  £>5 


-  sin  ^  sin  " 


eauation  ,  /  \>s\ 


leaking  use  of  the  theorem  on  reliability  for  a  given  minimum  separation, 
and  the  asymptotic  formula  for  large  n  for  erf  x,  we  obtain 

Eliminating  A  by  its  relation  to  R,  we  get  the  final  bound  on  reliablility  E 

Ej^ysin  2  \  sin  ~Y  e"R  -R. 
Note  that  as  R-*0  this  lower  hound  approaches         the  same  value  as  the  upper 
bound  on  E  previously  derived.    Thus  vre  collude  that  E(0)  -  jL„ 


Lower  Bound  on  P    in  Gaussian  Channel  by  Minimum  distance  Argument 


In  a  code  of  length  n  with  Id  code  words,  let  m.     (i  -  1,  2,  „  .  m,». 
s  •  1,  2,  o  .  be  the  s     coordinate  of  code  word  i„    We  are  assuming 

an  average  power  limitation  P,  so 

V/e  also  assume  an  independent  Gaussian  noise  of  power  N  added  to  each 
coordinate. 

We  now  calculate  the  average  squared  distance  between  all  the  H(I4  -  l)/2 

pairs  of  points  in  n~space  corresponding  to  the  M  code  words.    The  squared 

distance  from  word  i  to  word  :  is  21    (m     -  ■  «L  )  .    ^he  average  D~  between 

s  is 

all  pairs  will  then  be 

7?           1  ,  ,2 

D   (m     -  m    )  . 

14(M  -  1)      8,1,;]       1  28 
Note  that  each  distance  is  counted  twice  in  the  sum  and  also  that  the 
extrs-r;  ous  terms  included  in  the  sum,  where  i  -  j,  contribute  zero  to  the 
sum.    Squaring  the  terms  in  the  sum, 
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^  — -   2i!Fn  M 

M(M  -  1) 

7?  .  2nMP 

r  <   , 

iff  -  i 

where  we  obtain  the  third  line  by  using  the  inequality  on  the  average  power 
(1)  and  by  noting  that  the  second  term  is  necessarily  non-pcsitive. 


°[  2 

If  the  average  squared  distance  between  pairs  of  ocints < 2nM?A»  -1 
there  must  exist  a  pair  of  points  for  whose  distancs  this  inequality  holds. 
Each  point  in  this  pair  is  used  I  of  the  tint*      Tr.e  best  detection  for 
separating  this  pair  'if  no  other  points  irers  irpOd  b„  by  c,  plane 

normal  to  and  bisecting  the  joining  line  segneir^  and  either  point  would 
then  give  rise  to  a  probability  of  error  equal  to  tbst  of  the  noise,  carry- 
ing a  point  half  this  distance  or  acre  in  l  c«  .cilice  direction*    Y7£  arrive 
therefore,  at  a  probability  of  error 

-V?!  rr  [noise  in  a  certain  direction £t^  I 


'(li  -  1)2N 

As  n-roo  and  assuming  il-^oo  also  in  such  a  cay  as  to  approach  a  definite 
rate  -  log  K^R^C  we  may  translate  this  into  a  bound  on  the  asymptotic 

reliability,,    This  is  done  by  ^ing  the  asyEptotic  Iormula    erf  x  ^1     1  ^x 

x   -x         * " 

Using  tnis,  taking  the  logarithm  and  dividing  by  n  gives  the  simple  upper 
bound  on  reliability 
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The  Sphere  Packing  Bound  for  the  Gaussian  Power  Limited  Channel 

The  analog  of  the  sphere  packing  argument  can  be  carried  out  in  an 

interesting  geometrical  fashion  for  the  gaussian  channel.    V.'e  assume  an 

average  power  limitation  F  and  an  independent  gaussian  noise  of  n  coordinates 

frith  variance  N  in  each  coordinate.    Consider  the  n~sphere  5  whose  squared 

radius  is  P  +  6(„    Since  the  average  squared  radius  to  the  signal  words  is 

P  or  lesss  a  fraction  at  least        ■-  of  these  words  are  within  the  P  *  6( 

sphere  fors  if  not,  the  fastioii  greater  than  (1  -  p  J  »)  at  distance  at 

least  P  *  6  would  give  more  than  P  for  the  contribution  to  the  average 

power  by  themselves,    V/e  will  estimate  the  errors  due  to  only  the  signal 

words  inside  the  P  *  6,  sphere.    Even  if  all  code  words  outside  this  sphere 

never  caused  errors  and  this  minimum  possible  fraction  p     ^  were  inside 

the  sphere,  the  probability  of  error  for  the  entire  code  would  be  that  of  the 

code  consisting  of  these  interior  points  multiplied  by  p-  ~  ^,  and  in  general 

the  probability  of  error  will  be  greater  than  this.    Thus  the  reliability  of 

the  original  code  can  be  estimated  from  that  of  the  interior  points  with 

an  error  not  exceeding  i  log  p  ^'  g. 

n  , 

The  argument  we  will  use  is  similar  to  that  in  the  discrete  channel  but 
with  certain  complexities  and  refinements  added.    We  consider  >*.  sphere  of 
suitably  chosen  radius  Jk«    The  volume  VR  of  this  sphere  will  be  divided  by 
the  decoding  process  for  the  code  into  a  number  of  regions,  regions  which  are 
decoded  as  the  various  particular  signal  points.    To  each  signal  point  we 
will  assign  a  certain  volume  V-j^  of  "high"  probability  density  and  a  second 
volume  V2  of  "'low"  probability  density.    These  regions       and  V2  are  congiuent 
for  the  different  signal  points.    The  probability  density  of  a  point  being 


carried  by  noise  into  any  part  of  its  ?-  region  will  be  greater-  than  the 
density  for  any  part  of  its  Vg  region,,    Both  of  these  regions  will,  for  any- 
signal  point,  lie  entirely  within  the  sphere  of  radius  y'K.    The  conclusion 
will  be  that  for  any  placing  of  Vv/7,  points  the  probability  of  error  -"rill 
be  at  least  equal  to  the  probability  of  a  point  being  carried  irv.o  v  region. 
This  is  because,  in  a  way  similar  to  the  discrete  process,  starting  with  the; 
original  nartitioning  of  VvJ  we  can  reallocate  voluae  assigned  to  a  given 
point  in  order  cf  decreasing  probability  density  and  equalize  allocation  betwee 
points  until  each  point  has       assigned  to  it.    These  operations  preserve 
total  voluse  and  decrease  (calculated)  probability  of  error.    When  the  equal- 
ization is  couplets,  each  signal  point  has  its  vg  region  assigned  entirely  to 
other  points,  and  consequently  tba  probability  of  error  is  at  least  that  of 
a  point  being  tal:cn  to  its  Vg  region„ 
In  the  figure 


0  is  the  origin,  X  is  a  signal  point  at  naxiaal  radius']?  +  6^  and  the  large 
circle  is  the  intersection  of  the  K  sphere  with  the  plane  of  the  drawing .  At 
X  we  construct  the  hyocrplarie  perpendicular  to  OX,  and  let  the  distance  from 


X  to  the  intersection  of  this  plane  with  the  K  shpere  bv/\XK  *  6?0  Here, 
N  is  the  average  noise  power,  X  is  an  arbitrary  multiplier ,  and  6_  is  a 
small  quantity  which  will  eventually  approach  zero,    Itafr  construct  the  two 
hemispheres  of  radii  \\H  and  \XK  +  6    centered  on  X,  pointed  toward  C  and 
bounded  by  the  hyperplane .    It  is  clear  that  the  entire  vc lints s  cf  both  of 
these  hemispheres  are  within  the  large  K  sphere .    A'he  smaller  hemisphere  is 
the  V.,  region  for  signal  point  X  and  the  shell  between  the  hemispherical 
surfaces  is  the       region.    For  any  other  signal  pointy  a  similar  pair  of 
hemispheres  is  constructed  by  drawing  the  line  from  the  origin  to  the  -ig^'l 
point,  constructing  the  perpendicular  hyperplane  and  constructing  hemispheres 
of  radifHXN  'and'(~XN  +  5„j  facing  toward  the  origin0    If  the  origin  itself 
vrere  a  signal  point,  any  hyperplane  through  the  origin  may  be  used.    It  is 
obvious  in  the  drawing  that  anv  point  of  these  hemispheres  actually  in  the 
plane  of  the  drawing  is  within  the  K  sphere.,  (being  nearer  to  the  origin  than 
^  K  " ) o    But  the  plane  of  the  drawing  may    be  made  to  pass  through  any  desired 
point  in  the  hemisphere  by  suitable  rotation,  hence  the  property  is  true  in 
general. 

Since  probability  density  for  a  given  displacement  from  a  signal  point 
is  a  iconotone  decreasing  function  of  the  actual  distance  of  displacement ,  the 
probability  deisity  for  any  point  in  the  shell  is  less  than  that  for  any 
point  in  the  inner  hemisphere.    Let  IS  be  the  nunber  of  signal  points  such 
that  the  conbined  volume  of  their  small  hemispheres  is  just  equal  to  that  cf 
the  K  s;.here.  Thus 


Now j  whatever  the  decoding  system  or  the  placement  of  Ji   points  interior  to 


the'{p  +  6    sphere,  the  probability  of  error  P    (due  only  to  errors  inside 
the  K  sphere)  will  exceed  the  probability  of  a  -oint  being  carried  into  its 
Vg  shell,    *his  follows  from  our-  general  argument  concerning  reallocation 
of  voluas  in  accordance  with  higher  probability,.    Thus  if  the  msssage  place- 
ment and  decoding  system  allocate  any  volume  in  shells  or  other  low  probability 
density  regions  to    ode  points,  a  lower  calculated  Pg  would  occur  if  this  were 
calculated  as  though  at  the  higher  probability  density  of  the  in'ier  hemi- 
sphere.    rthen  this  reallocation  is  finished,  we  have  a  probability  of  error 
satisfying 

where  Z  is  the  squared  radial  displacement  of  a  point  due  to  noise  (divided 
by  n) .    Since  Z  is  the  sum  of  n  independent  £  aaussian  variater*  e«.eh  with 
variance  U,  Z  is  distributed  (apart  from  scale)  according  to  the "7^,  dis™ 
tribution  with  n  degrees  of  freedom.  Thus 


.,r#  &ri  4 

-  -    /  — ■ — =r—         e  T 

>  2T  # 


For  any  given  b    0,  the  logarithm  of  the^L  distribution  from  X  to  X  +  *p 
_  X-l 

is  asjnnptotic  to     log  ~ — .    ^h±s  Can  be  easily  shown  by  use  of  the  moment 

c,  K 

generating  function  and  the  results  on  the  tails  of  distributions  obtained 


previously.    Consequently  our  reliability  E,  as  n-500,  is  asymptotically  less 


than  or  equal  to  ^  log  i-^— .    Also  the  rate  R  -  \  log        log  L-^-^/~?±  log 


P  +  XE  J-  6,  +  &2 


.    Since  this  is  true  for  any  5n ,  6^0,  we  may  omit  them 


entirely  and  obtain  asymptotic  bounds  for  «■  and  F.  as  follows. 

log  -   ^       J  — y 

'^h©Se  formulas  give  an  upper  bound  on  the  reliability  carve  in  a  para- 
metric form  using  the  parameter  X  which  ranges  from  1  to  »o«   With  X  .lust  ( 
greater  than  1,  we  have  a  rate  just  below  channel  capacity  and  a  reliability 
bound  which  is  justt  slightly  positive.    As  the  value  of  X  increases,  the 
rate  R  decreases  and  the  bound  on  E  increases,  ^mw.mm  infinite  when  \  is 

infinite  and  the  bound  on  rate  is  zero.    Of  rourse  the  bound  based  on  minimum 

p 

distance  shows  that  the  actual  E  curve  does  not  exceed  ^  as  R-». 


The  '^terminal  Channel 

Almost  all  previous  wcrk  on  coding  theory  has  dealt  with  a  one-directional 
channel  having  an  input  or  transmitting  point  and  an  output  or  receiving 
po:'nt,  or,  at  most,  with  this  arrangement  plus  a  feedback  charms!  from  the 
receiving  point  to  the  transmitting  point  whose  function  was  thoug-vt  of  as  a 
possible  aid  in  forward  communication.    l!aay  cases  arise ,  however,  in  which 
a  number  of  inf ormacion  terminals  are  involved  and  both  backward  and  forward 
communication  is  cf  interest  perhaps  between  all  pairs  of  terminals,.  As 
examples  we  may  cite  telephony  (or  even  ordinary  direct  conversation)  where 
communication  in  both  directions  is  important,  or  a  network  of  radio  or 
television  stations  in  whti.^-h  there  are  a  number  of  communication  links 
using  a  common  medium^ 

A  further  complication  is  introduced  by  the  possibility  of  competition 
or  conflicting  interest  among  the  individuals  controlling  the  operation  of 
the  various  terminals.    As  an  example  we  have  the  case  of  a  secrecy  system 
which  is  best  thought  of  as  a  three-terminal  channel  with  the  transmitter 
as  one  input,  a  receiver  as  one  output  and  the  enemy  cryptanalyst  as  a 
3econd  output.    The  object  is  to  transmit  information  from  the  transmitter 
to  the  receiver  withCut  knowledge  by  the  enemy.    A  second  example  is  the 
problem  of  "jamming",  again  a  three-  terminal  channel,  but  new  the  enemy 
has  an  input  rather  than  an  output  and  his  object  is  to  reduce  or  eliroiaate 
the  direct  transmission  of  information., 

These  possibilities  suggest  that  we  should  frame  general  definitions  of 
T-termin&l  channels  and  study  their  characteristics  from  the  information 
theoretic  point  of  view,    V.'e  shall  here,  for  simplicity,  limit  ourselves  to 
tho  discrete  case  quantized  in  timi. 


Definition;    A  ^-terminal  finite  state  channel  consists  of  T  inputs  x. 
(i  -  1,  2,  T)  each  of  irhich  may  assume  values  from  a  finite  alphabet 

(not  necessarily  the  saiss  for  the  different  inputs ) ,  T  outputs  y  ,  y2,,.eS  y 

each  of  which  can  assure  values  from  an  associated  finite  alphabet,  and  8. 

state  variable  S  which  can  assume  any  of  a  finite  set  of  values  1,  "..  Do 

Finally,  there  are  conditional  probabilities  for  the  next  outputs  and  the 

next  stats  conditional  on  the  current  inputs  an-;  current  state;- 

ir  (y.'/Sj.x,  j  So,  .  .       x_)o    and  Pr(S«/S,      ,  Sr,?  •  -'»«* 

Definition;    A  msmorylcss  ?~terminal  fir:<te  state  channel  is  one  in  which  tt. 

stats  S  can  assume  only  a  single  va'Tue. 

Definition;    A  noiseless  T^ter-minal  discrete  channel  is  one  in  which  all 
probabilities  are  either  0  or  1.    Thus,  the  next  state  and  the  next  outputs 
are  strictly  determined  by  the  current  state  and  current  Irpat:?  ,    In  the 
noiseless  memoryless  case,  this  stats  can  have  only  one  value  so  the  next 
outputs  are  functions  of  the  current  inputs 0 

In  operation  of  a  T-terminal  channel  we  imagine  operators  or  equipment 
at  each  of  the  terminals.    Also  at  each  terminal,  in  general,  will  be  an 
information  source .    The  operators  are  attempting  to  transmit  information 
produced  by  the  sources  between  the  terminals  according  to  some  general  plan 
and  system  of  codes  "hich  has  been  agreed  upon„    In  general,  the  operator 
at  terminal  i  can  control  the  input  i  but  only  as  a  function  of  the  data 
available  to  him  at  the  time.    This  includes  the  -past  and  present  of  output 
i  and  the  output  of  message  source  i  up  to  the  present  time  but  not  the 
future  of  these  random  f unction?,  nor  any  of  the  other  inputs,  outputs"  or 
message  sources  (past  or  future) „ 


We  will  first  consider  the  completely  cooperative  situation  in  which 
the  operation  of  all  terminals  is  directed  toward  a  common  end.    The  pro- 
blem is  very  similar  to  a  one-person  game  in  the  game  theoretic  sense  with 
"split  personality"  for  the  player.    We  can  think  of  the  cpsrati.>: l;  the 
various  terminals  conferring  at  the  beginning  on  a  general  strategy.,  selection 
of  codes  and  decoding  operations,  and  then  going  to  their  respective  terminals 
and  operating  the  system  according  to  the  agreed-upon  plan.    Together  they 
act  like  a  single  player  whose  knowledge  i&  making  different  moves  is  not 
coextensive o 

In  the  more  general  case,  one  may  consider  a  p-person  game  in  which  the 
T-terminals  are  partitioned  into  p  subsets,  the  operators  in  each  subset 
having  a  common  purpose  which  may  conflict  with  those  of  other  subsets „  Tne 
operators  in  a  given  subset  agree  on  a  strategy  to  promote  their  goals  and 
act  as  one  person  in  a  kind  of  p-person  game. 

In  the  fully  cooperative  case  there  are  many  utilities  one  might  wish 
to  maximize  in  a  given  channel.    In  line  with  basic  coding  theory,  however, 
our  attention  is  directed  to  the  question  of  generalizing  the  coding  theorem 
for  a  noisy  channel  to  this  kind  of  a  situation,,    In  other  words,  we  would 
like  to  find  ths  capabilities  and  limitations  of  a  T-terminal  channel  with 
regard  to  essentially  errorless  transmission  of  information  between  the 
different  terminals.    At  a  given  terminal,  say  terminal  1,  we  may  imagine  that 
the  information  source  1  produces  information  which  is  destined  for  various 
other  terminals  2,  3,  T.    It  might  also  produce  some  information  which 

was  intended  for  both  terminals  2  and  3,  and  some  for  both  2  and  h»  etc.,  and 
indeed  it  might  have  a  component  intended  for  any  subset  of  the  other  ter- 
minals.   3?he  same  may  c?  course  be  said  of  any  other  terminalo    In  general, 


T~l 

we  think  of  each  5iessage  source  as  producing  not  or*  but  2  -/streams  of 
independent  information  intended  for  the  2  "-/subsets  (omitting  the  null 
subset)  pf  the  other  T-l  terminals . 

A  simple  two-terminal  one-way  channel  is  characterised    at  the  simplest 
c  oding  level  by  it*  capacity  C.    In  the  '.'-terminal  case,  ws  must  consider 
the  capacities  of  all  the  different  types  just  described,  that  is,  C^s  the 
capacity  from  terminal  i  to  subset T of  the  remaining  terminals,  a  total  of 
T— 1 

T(2       =1)  different  capacities.    Furthermore,  these  are  not  fixed  quan- 
tities but,  in  general,  capable  of  some  variability.    Thus,  one  may  increase 
one  of  these  capacities  at  the  expense  of  reducing  another.    Our  fundamental 
problem  is  not  to  evaluate  a  single  C  as  before  but  to  find  which  sets  of 
values  of         are  possible. 

In  the  case  of  only  two  terminals  but  with  an  input  and  output  at  each 
terminal,  there  are  only  two  different  capacities  C^,  since  there  is  orT-y 
one  non-null  subset  of  the  remaining  terminals.    These  capacities  we  may 
write  C12  and  C21-    Our  problem  is  to  find  the  possible  values  of  the  pair 

(C12'  C21^  °r'  better»  the  boundar-'  of  this  domain  111  the  C12»  C21  SpI 
This  boundary  may  be  called  the  capacity  surface. 

The  channel  in  Fig.  1  is  a  simple  example  where  the  two  boxes  represent 
an  ordinary  one-ray  msmoryless  channel  with  capacities  Cj  and  C2<,    The  graph 
at  the  right  of  Fig.  1  shows  the  region  of  attainable  rates  in  the  two  direc- 
tions and  the  heavy  line  boundary  of  this  is  the  capacity  surface.    In  this 
case  transmission  in  either  direction  neither  aids  nor  hinders  transmission 
in  the  reverse  direction  (feedback  cannot  increase  forward  transmission  in  a 
memory less  channel) . 


The  channel  in  Fig.  2  is  rare  interesting  from  this  point  of  view  „  The 
two  binary  inputs  from  the  two  terminals  are  added  mod  2  and  the  output  is 
a  common  output  going  to  both  terminals.,    Here  again  it  is  possible  to 
achieve  points  in  a  rectangle.    Note  that  at  each  transmitter  the  transmitter 
symbol  should  be  added  mod  2  to  the  next  received  symbol  to  compensate  :'cr 
its  effect.    It  is  curious  that,  in  a  sense,  two  bits  per  time  interval  are 
going  through  the  vertical  line  of  the  drawing,  one  destined  in  each  direc- 
tion o 

Another  channel  is  indicated  in  Fig.  3.    ^'here  are  VhW-  input  letters 
a,  b,  c  at  the  left  terminal  and  three  input  letters  A,  B,  C  at  the  right 
terminal.    If  a  is  used  at  the  left,  the  channel  from  the  right  is  as  shown 
in  the  figure,  a  channel  with  capacity  1.    B  or  C  come  through  to  correspond- 
ing received  letters  B1  and  C  while  A  divides  with  probability  ^  between 
these.    If  b  or  c  is  used,  the  channel  from  right  to  left  has  zero  capacity, 
all  letters  A,  B,  C  dividing  equally  between  B'  and  C«.    In  the  reverse 
direction,  the  situation  is  similar  with  capital  letters  exchanged  for  small 
letters.    Thus  there  f>s  a  direct  conflict  between  sending  information  to 
the  right  or  the  left.    Any  point  in  the  triangiilar  region  can  be  attained 
but,  we  suspect,  nothing  outside.    To  obtain  a  point  on  the  diagonal  boundary, 
say  ci2  =  x  *od  c2l  '  1  "  x»  the  channel  may  be  used  x  of  the  time  to  the  right 
(that  is,  the  right  hand  operator  uses  A)  and  1  -  x  of  the  time  to  the  left 
(the  left  hand  operator  uses  a).    In  each  case,  the  other  operator  sends  at 
full  capacity. 

In  the  general  T- terminal  memory less  channel,  essentially  this  apportion- 
ment of  time  nay  be  carried  out  to  prove  the  f  ollovring  theorem. 
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Theorem;    The  capacity  surface  is  convex  outward.    That  is,,  if  the  sets 
Ci(y  and  C^a  can  be  attained  (where  i  ranges  over  the  terminals  and  a  over 
subsets  of  terminals  excluding  i),  then  the  set  of  capacities 

cm  -xc.ct*  (i~x)qg  QiXil 

can  also  be  attained. 

This  is  proved  readily  by  subdividing  the  time  between  the  coding  sys- 
tems -which  give  Cia  and  C£a  in  the  ratios  X  and  I  -  V-    If  these  are  irrational, 
they  may  of  course  be  approximated  by  a  sequence  of  rationals. 


Conditions  for  Constant  Mutual  Information 


Theorem:    In  a  ctannel  with  p±(j)  matrix  and  ?±  input  probabilities 
necessary  and  sufficient  conditions  that  the  mutual  information  be  constant 
are  tfcst 

(1)  P.-(j)  "        a  function  of  j  only 

(2)  _ 

Pi  -  h,  '.ndependent  of  j,  when  S..  is  the  set  of  input  letters 
that  can    csiuse  output  Letter  j. 

We  also  have    Zf  .  -  h*"1  -  e1,  where  I  is  the  constant  information  value  o 

3    Fi  PiC3)  I 
Proof ;    Suppose  log        j~—    *  I.  Then  pi(j)  -  e        a  function  of  ; 

only.    Also  if  qi(i)  is  the  conditional  probability  of  i  given  j,  then 


s .    0  s . 


To  prove  the  sufficiency,  assume  (1)  and  (2).    From  (1) 

Now  summing  Pi  \^  ■  <lj(i)  over  ici0  and  ising  (2). 
h\    «  1 

so  \  .  is  h-1  independent  of  j„    Hence  I  -  .  og  h"  o 

J 


Simple  Sroof  that  H  (x)^H(x) 
We  wish  to  prove  that 

21P(i,  j)  log  p.(j)^  -  2.  P<j)  1°E  P(vi) 
i,  j  1  0 

We  will  prove  this  'or  each  particular       summing  on  j  will  then  give  the 
desired  result.    Thus  we  will  shew 

-  2   p(i,  o)  log  p.  (j)^  -  p(j)  log  p(j) 

i  1 

or 

-  2.  P(--)  Pi(j)  log  Pi(j)^.-        P(i)  P-tt)  log  S-P(i)  P^) 

i  i  i 

Consider  c?(x)  -  x  log  x.    %is  function  is  convex  down  ward  for  x  J  0  since 

<p'  '  (x)  =  i^O,,    Therefore  it  satisfies  the  inequality  (see  Hardy,  Littiewood 
x 

and  Polga  "Inequalities"  p.  7k) 

(pC^L^  xi)^"^L   ^  where<>31  -  1 

Take  x^^  *  pi(j)  and  q^  =■  p(i) 

^  p(i)  P±(j)  log  "2L    p(i)  P±(j)^£p(i)  p±(j)  log  Pi(j) 

This  is,  after  multiplication  by  (-1)  and  summation  on  j,  the  desired  inequality. 
Equality  occurs  only  if  all  p1(j)  for  a  given  j  are  equal.    Then  p^(j)  »  q(j) 
and  P(i»  j)  =  p(i)  That,  is,  the  two  events  are  Andependento 


The  Central  Limit  Theorem  with  Large  Deviations 

The  central  limit  theorem  states  that  under  certain  general  conditions 
the  sum  of  n  independent  random  variables  is  approximately  gauss ian  in  the 
neighborhood  of  its  mean  value  when  n  is  large.    The  most  common  theorems 
of  this  class  give  good  estimates  of  the  probability  at.  deviations  cf  the 
order  of  K"\n  from  the  mean,  while  mere  advanced  results  with  added  terms 
(for  example,  the  results  on  p.  lit 7  of  Feller,  Probability  Theory  and  Its 

Applications)  alio'17  somewhat  larger  deviations  but  still  require  that  the 
———————— 

deviation  from  the  mean  divided  by  n  approach  zero  for  the  estimate  to  be 
asymptotic  to  the  correct  value  v~ith  large  n. 

We  Tri.ll  develop  asymptotic  formulas  under  certain  conditions  for  the 
probability  density,  the  probabilities  of  the  tails  of  the  distributions , 
etc.,  for  arbitrary  deviations.    In  the  usual  central  limit  theorem,  the 
behavior  near  the  mean  is  related  to  the  characteristic  functions  or,  as 
we  prefer  here,  the  moment-generating  functions  near  the  value  zero.    It  i£ 
interesting  that  the  results  here  show  that  the  distribution  remote  from  the 
mean  is  in  a  very  similar  fashion  related  to  the  moment-generating  functions 
at  arguments  avray  from  zero.    Thus  we  are  able  to  attach  a  fairly  direct 
significance  to  the  value  and  derivatives  of  the  moment-generating  functions 
at  non-zero  arguments.    Indeed,  the  method  of  derivation  of  our  asymptotic 

estimates  is  a  kind  of  manipulation  trick  whereby  points  array  from  zero  are 

- 

translated  into  zero.    This  device  is  due  to  Escher  and  has  been  used  by 
Cramer  in  a  manner  similar  to  our  analysis,,    However  our  results  go  further 
than  those  of  Cramer,  most  of  whose  work  applied  only  near  the  mean  of  the 
distribution. 


Let  F(x)  -  Pr  |u^x*^be  the  distribution  function  for  the  random  vari- 


able Uc    The  moment-generating  function  is  then 
oo 

<p(a)  -  \  eSX  m(x) 


i 


Let  this  converge  in  the  range  i.  <  s  < B  (sither  or  both  A  and  E  nay  be  infinite). 
We  are  interested  only  in  cases  where  E7  D/iU    This  includes  distribution 
functions  which  are  bounded  in  range  or  which  approach  aero  and  one  expo- 
nentially cr  faster,  as  -with  the  gauss ian  distribution  or  the  distribution 
-4  I  ri 

whose  density  is  e  2  f  M  „ 

The  moment-generating  function  is  an  analytic  function  of  s  (thought  cf 
as  a  complex  variable)  in  the  strip  where  a< Re         <  B,    If  n  variables, 
all  independent  and  distributed  according  to  the  same  F(x),  are  added,  the 
sum  X  is  distributed  according  to  the  n-fol:!  convolution  F  (x) e    The  momsnt- 
generating  function  of  F  (x)  is 


^(s)  -  j^<? (s)  Jn. 


We  wish  to  estimate  FR(\n)  when  n  is  large. 

Consider  a  new  random  variable  u  whose  distribution  function  q(b)  is 

defined  by         ■  . 

\     e  °  dF(») 


„  f  s  — oo 
0(»)  - 


-6o 


s  ■ 

e  0  dF(») 


Here  ..    xs  a*  arbitrary  real  constant  lying  bst^scn  *  anc 
~  o 


The  nosvnt-generat-iijg  function  fcfr  G  flfe.  •  is 


oc  : 

-I  e 


.V  dr(s) 


The  asstn  and  variance  of  the  0  uistributio:-:  tay  oe  found  iron  the  f. 
second  derivatives  o!  C^{s)  evaluated  at  s  •   C  Thus 


f  0 


£_i£  f 


L*  (tVj 

Kou  suppose  n  variables,  all  independent  and  distributed  according  to 
are  added.    The  sufli  z  will  be  distributed  according  to  Q  (§}  ritn  the 
mcnent-genorating  funct  ion 


n 


'his  ir^-Iies  that 


sos 


cir;ce       v.i'i-ioxj  of  v  in  the  argument  oi  the      ^rating  function  c.rra-".; 


tc  z  r.x!  tij.  ligation  by  e'°    in  the  distribution  function. 


Thus  the  distribution  G(i)  after  n-fold  convolution  is  still  closely- 
related  to  the  n-fold  convolution  of  F(x). 
dFn(x)  -  v(rQf  e  0  dGn(x) 

The  basic  mathod  of  using  this  relation  to  study  the  bshavior  of 

the  distribution  F(r.)  is  ss  follows  »    A  value  of  s    is  chosen  in  such  a 
u  o 

way  as  to  make  the  moan  of  the  Q  distribution  occur'  at  the  value  xof  7 

n 

in  which  we  are  int3restedo   When  this  is  done,  Gn(x)  can  be  estimated 

well  from  the  ordinary  central  limit  theorems,  since  these  are  particularly 

good  at  and  near  the  msan0    The  relation  between  Fn  and  G    is  then  used 

to  translate  estimates  of  G„  behavior  into  estimates  of  F  behavior. 

/  «  n 

It  is  convenient  to  use  in  place  of  the  moment-generating  function 
0(a)  its  logarithm,  which  we  will  denote  by  u-(s  ).    This  function  is  some- 
times called  the  semi-invariant  generating  function ,    In  terms  of  u,(s)  we 
have 

dFn(x)  -e^  e-SXdGn(x), 

The  successive  derivatives  of  p.(s)  evaluated  at  zero  are  called  the 
semi-invariants  of  the  F  distribution,    in  particular, 
/  u(0)  -  1 

M.'(0)  -  Jx  dF(x)  *  mean  of  F  distribution 
u-ri(0)«   )x2  dF(x)  -  a2  of  F  distribution 


For  the  G(x)  distribution,  the  log  moment  generating  function  u  (S)  is 
given  by  (taking  the  logarithm  of  (1)  ^ 

u-G(s)  -  n(Ss  +s0)  -  n(ao). 
Consequently,  for  all  derivatives  (using  a  superscript  to  denote  differentiation) 


In  words,  the  semi-invariance  of  the  G  distribution  are  the  derivatives 
of  the  F  distribution  evaluated  at  &0.    In  particular,  the  mean  and 
variance  of  the  G  distribution  are  t*'(so)  and  n"(so).    The  mean  and 
variance  of  the  Gn  distribution  ara,  similarly.  njj.'(s0)  and  nu.,,(so),» 

Note  that  the  operation  of  forming  the  new  distribution  function 
G(x)  (or  the  corresponding  new  randcn  variable)  from  a  given  distribu- 
tion function  F(x)  (or  its  random  variable)  is  a  group  operation*.  Thus, 
if  we  let  T   denote  the  operation  whi.;h  applied  to  F(x)  gives  G(x), 

3 

/  .00 

TsF(:  e*Sx  dF(x)   /    (     <TSX  dF(; 


then  the  T    form  an  additive  Abelian  group  isomorphic  to  the  additive  group 
s 

for  real  numbers, 

TSl  '  TS2  *  TS1  *  S2 

T    -  I. 
o 

The  operation  T    is  distributive  over  the  binary  operation  of  convolution 
s 

(which  itself  is  commutative  and  associative).    Thus,  if  we  denote  convolution 
of  two  distribution  functions  by  an  asterisk  and  repeated  convolution  of 
the  same  distribution  by  an  asterisk  preceding  the  exponent,  we  have 
T    (F  *  G)  ■  (T?)  *  (TO) 

5  DO 

T    (F*n)      «  (TF)*D. 

B  8 

This  last  equation,  when  we  operate  on  both  sides  by  T    „  gives  the  basic 

"•3 

result  we  have  used  in  estimating  tails  of  distributions, 

-  L(TF)#D. 
~*s  s 


If  we  think  of  the  operation  T  F  =  G  as  producing  a  new  probability 

s 

measure  for  the  random  variable  x,  then  there  is  a  one  to  one  correspondence 
between  points  in  the  two  probability  spaces  involved,  the  F  space  and  the: 
G  space,  and  also  between  points  in  the  product  spaces  of  F  v,rith  itself 
n  times  and  G  with  itself  n  times ,    The  probability  measures  in  the  two 
spaces  are  very  closely  related.    If  a  point  in  the  *  space  has  value  x  and 
probability  P,  the  corresponding  point  in  the  G  space  has  value  x  and 
probability  Q  ■  eSX  P  /jeSX  dF(x).    If  we  select  a  subset  8^  of  points 
whose  x  values  all  lie  between  A  and  E,  then  we  will  have 

where  k"1  =  JeSX  dF(x). 


The  Chernoff  Inequality 

To  illustrate  the  use  of  the  G  distribution  in  estimating  the  tail  of 
the  Fn  distribution,  we  will  first  give  a  crude  but  simple  and  useful  bound 
on  the  tail  due  to  Chernoff,  who  proved  it  by  a  different  method.   TCe  have 

Fn(x)  =  e^So)       ^     e"^  dQa<y). 

=00 

If  s^O,  the  maximum  of  e  ;  0 occurs  at  y  ■  x.  Thus 


=00  s  <  C 


v  e  ^    o'       e  o 


This  is  true  for  any  x  and  any  gQ,  but  to  obtain  the  most  favorable  hound 

we  should  choose  Sq  so  as  to  minimize  nu-fs^)  -  x5q  (for  the  x  in  question)  „ 

Remembering  that  p,(a)  is  analytic  and  that  (i"(8)  >  0  (since  it  is  a  variance) 

the  necessary  and  sufficient  condition  for  a  minimum  jfy  that  nu-'(3Q)  -  x. 

This  will  have  a  unique  solution  in  s  .    However,  it  is  more  convenient  to 

o 

express  our  result  parametrically  in  terms  of  s  ^,  or,  dropping  the  subscript, 
in  terms  of  s .  Thus 


Fn  (ntWs)  )6e'\^°'  "  ^™  /  s<  0 


(nu<(s)  )sA«a)  -^,<80 
in  a  similar  fashion,  by  integrating  from  x  to  op,  we  obtain  a  bound  on 
the  tail  in  the  positive  direction  of  exactly  the  same  type.  Combining 
these  res-alts  we  have  the  following.    If  Fn(x)  is  the  distribution  function 
of  the  sum  of  n  identically  distributed  random  variables,  each  with  log 
moment  generating  function  n;(s)  which  exists  for  A<£<B»  then 

Fn(nuKs))^  e4(S>  -S^(S))  A<s<0 


v. 

Thsse  bounds  are  very  .ioefc!'  in  that  they  are  retreat ly  $i;nple  to 

icmt  ute «    l-'urthericore.  tiisy  ere  iKv  i  «vvy.t:;ii~  tc  B    or  I  -  F'n  a<- 

li-^jc,  tba  lor&rithrs  :>f  the  L^r..;  '.:  tc  trc  icgST-it-hcs  c  r 

P    -  li  1  -  P    (in  the  rsssiecoirt  s  ran£e?t.,  r.e  v..  •       seti,  Xtter.    ?:er .. 2 

if  s>3  &-.'e  interns-.  .  or»b*  iz  tra  l-.-raritiar.  c>:  ?    .0    l&r»  e  -■  the  Ghs-rr.ii' 

brv":  is  rivs  the  oc  rect  a.~-"-ctc.i:.r   ■ssh."?'.  ii 

In  tbs  "J?  '       i,zg  ct-ti'n*  :••  a  :  '-11        -  .7-  ».  r*   ins  .  e-. ti.:;:.too  of 

F.  (>•;•         1  -  Ih„ '  hf  ^..::-r  aera  sari  J&  -sti-.u-.:.' ;    "As  -Ut.'.ris'i  arrs?  . 

7'  h  Wi*l  V?'.:  1;.  fcfU',-y8r    ;  is,  tc  •-•      ?:k'  tc  ;  n-s  c£-y  •  - 

tft*  :  ---  ;-:e-  ).r  ?    3'rtrP  .'.isi  .  ra  -IV.     -••-3J.1-.U  ih'-i'is  rti.tC*« 

r 

i."  -  .  riouc  oi  ^actions  ■ 


T>1 


Upper  and  Lower  Bounds  on  the  Tails  of  Distributions 

Theorem;    The  distribution  of  the  sum  of  n  1  :entically  distributed  independent 
random  variables  satisfies 

Fn        (b))~)    1        _  en(M:<3)  -  L  + 

1  -Fn  (au'  (.))  f  *  H12^"^)  V  s       '  V  ^ 

where  u.(s)  is  the  log  moment  generating  function  of  F(x),  Hrj  ^d       '  1 

are  derivatives  and  c  is  an  absolute  constant,  tie  constant  in  the  Berry 
theorem  relating  to  the  approximation  in  the  central  limit  theorem  -,:ith 

error  less  than  or  equal  to  Also  c  may  le  replaced  in  the  inequality 

o-  ntL 

by  3  In  riffn. 


Proof:   We  have  oq. 



3  .>  G 

nil'  (s) 


Proof :   We  have  oo 

1-Fn(n^.(s))  -en^(s)    \       e~sx  dGn(x) 


On  making  the  substitution 
1nn"(s) 

and  writing  Hn(y)  for  Gn^  np  ■  (s)  y  -  n^'(s)  we  obtain  an  Hn  distri- 
bution, with- mean  at  zero  and  variance  one,  suitable  .'or  application  of 

ordinary  central  limit  results.    The  equality  above  becomes 

oo 

Fn(nn.(s)  )  -  "  )    jj  J  dHn(y)o 

0 

H  (y)  can  be  estimated  from  the  Cramer-Berry-Essee n  theorem.  Thus 
n 

Hn(y)  -6(y)  +  B(y) 
B(y)<fT 

where       -  3^  /  ^'  '"f  and  [J^  is  the  third  absolute  moment  of  F„ 


The  integral  then  breaks  into  two  parts.    First  vre  have: 


injL«  •  (s ; 


;  y 


dj(y) 


1 


-y  /2  ->  s"f  nu.' '  (s)  y 


dy 


qo 

-f*T  J 


s^'n^'Cs))  '  s  n^"(s) 

e- — 


dy 


s  nu"(s)    

e       5  $  (s^nu.'f(s)) 

2  2 

s    ny."(s)  -s  nix"(s) 


"flw  s/Vihi"(s) 


1  + 


"T 

s  nu. 


u."(s)  / 


s  \2Tmu.' 1  (s)  s    nu.1  •  (s) 

The  second  integral  involving  dB(y)  may  be  bounded  by  integrating  by 


parts, 
op 


r°?     ,  .  ,   -f 


oo 


B(y)    +  sfnu«'(s)  \  B(y)e 
0 


^  — £.  +  slnp.' '  (s)  — ± 

v  -fn~  -1/n*  sVnn"(s) 

2cp0 


»  2c    sj2nn"(s)  £3  

s^nnp.'  «(s)        u."  | 

Collecting  these  terms,  we  obtain  a  bound  for  the  tail  of  the  distribution: 


n  sl2Trnu" 


n(u  -  su^) 


1  + 


s  nu-" 


+  2c 


By  a  well-known  inequality  p,1^  3i  ^  and    ?,  -  u.1V  m  3vV'  ')2< 


This  results  :ln  the 


Consequently        -=  o/o 

3  (a")3'2 

final  inequality  involving  only  n,  s  ana  u.  and  its  derivatives  (together 
with  the  unknown  absolute  constant  c).    Since  the  original  Lyapui:ivr£ 
theorem  (with  constant  estimated  by  Cramer)  gives  an  inequality  for  B(y) 
as  follows 

we  Eay,  in  our  inequalities,  replace  c  by  3  log  n.    This  makes  them  corn- 

■ 

pletely  definite,  although  at  a  certain  loss  in  order  of  magnitude  as  a 
function  of  n  when  n  is  large* 

To  estimate  a  lower  bound  on  the  tail,  the  method  is  identical  up  tc 
the  point  where  we  must  estimate  the  following  integral, 


1 1 


cp. 


Again  using  the  theorem  involving  — it  is  evident  that  the  monotone 

n 

increasing  function  Hn(y)  which  would  minimize  this  integral,  subject  to 
being  within        from  <£(y),  would  be  that  shown  in  the  figure. 


This  function  starts  at  zero  as  high  above  jfc  (y)  as  possible  and  is  con- 
stant at  this  value  as  lcP-g  as  possible.  It  then  ircreases  as  slowly  as 
possible.    It  is  easily  shown  that  any  other  permissible  Hn(y)  gives  a 


■0  h 


larger  integral  than  this  f unction $  when  changed  into  this  function  in  the 

obvious  way  the  integral  is  decreased.    In  the  figure  the  corner  in  the  curve 

2cp. 

occurs  at  A,  which  is  such  thatX(A}-  J[(0)  *  -. — -.    To  obtain  a  single 

Tn 

estimate  of  A  which  is  on  the  safe  side  (that  is,  larger  than  the  actual 

A)  we  may  approximate  J[(y)  by  a  straight  lins  passing  through  i  at  y  =  0 

and  of  slope  i.    This  will  lie  below  Jj>  (y)  out  to  y  =  1.86.    Hence  if  the 

U  8cp 
A  computed  from  this  straight  line,  namely  A  is  less  than  or  equal 

H/n 

to  1.86,  the  estimate  is  safe.    If  not,  the  more  elaborate  formula  involving 

j£(A)  may  be  used.    In  any  case,  our  lower  bound  integral  becomes 
9P        ,  ■ 


e 

A 

On  completing  the  square,  as  before,  this  becomes 

s2nu-' '    —    s2n^ »  (s7np,"  -  A)2 

e~~2        4.(8^  np."  -  A)£  e    2  e  2 


(e^nuTT  -  A)*V'2n 


(s^r^i"  -  A)  ^np." 


2 

exp  (-A  sfl3T«  + 


Collecting  these  results  we  have  the  following r 
Sep 

Theorem*    If  A  =  ■    s     is  less  than  1.86 


Fn  (n^(s))~)       1      •         en(^(a)  -  s  ^(s))       e"  A  ^  *  Tj 

,        tf\Jfi£px    '  (f^-A/sfn) 
1-Fn  (n»*'(s);j 


Asymptotic  Behavior  of  the  Distribution  Function 

Theorem:  Let  n  random  variables  have  the  sane  distribution  function 
F(x),  the  logarithm  of  the  moment  generating  function  u.(s)  existing  for 
A  <  s     B  where  A  <  0  <  B.    Let  F^x)  be  the  distribution  function  for 
the  sum  of  these  random  variables. 

(1)  If  F(x)  is  not  a  lattice  distribution,  we  have  asymptotically 
as  n—^co 

\  svtennu.' 1  (s) 

(2)  If  F(x)  is  a  Lattice  distribution  with  maximum  span  h  and  A 
is  the  distance  from  nu.'(s)  to  the  next  lattice  point  in  the 


direction  away  from  the  mean,  then  asymptotically  as  zi-?oq 

F.6p.'(s)We°|Sl  ._h   ,  1  en(n(s)  -an«(B))  A<5<0 

nV        '      l-e""18'    h^2n  nn"(s) 

'     1  -  e  Is lh  /^2toiam(b) 

iYoofi    Consider  first  the  non-lattice  case.    The  two  results  s> 0  and 

s<0  are  substantially  the  same.    We  prove  the  s>  0  case.    As  in  the  theorems 

giving  upper  and  lower  bounds,  a  change  of  variable,  y  -  *  ~       (3)  reduces 

aji.»  •($/ 

the  problem  to  that  of  estimating  the  following  integral 


We  new  use  the  Cramer-Esseen  theorem  (Gnedenko  and  Kolmogoroff  p.  210) 
which  states,  in  effect,  that  for  any        0  there  exists  nQ  such  that 
when  n  7  n  we  have  g 

with  ~B{j)c^L.    Thus  the  integral  say  be  written  as  a  sum  of  three  integrals: 


-co 

e"s^"    y  U  2-(y),+  du(y)  +  dB(y)~] 
e-ys/a^a-y2)  J 

TT  fir\    <m   Two  fH'ncT  ini 


where  U(y)  *  c  '  "-^3 — — .    The  first  integral  may  be  evaluated  exactly, 

■XT*  6crTn" 
on  completing  the  square  in  the  exponent.    Its  value  is 
2 

Using  the  well-known  asymptotic  formula  for  1  -^>(x),  this  expression  is 

asymptotic  to 

2  2 
s  nu.r  j  -s  no.' ' 

~~ 2  1  m  2 

stTunp.'  • 


■1? 


nnp.' 


The  second  integral  is  In  fact,  let  the  integral  be  divided  into  two 

ranges  ^ 


/-1/6 


The  first  integral  is  because  the  total  change  in  U(y)  in  the  inter- 

val is  o(— )  while  the  integrand  is  bounded.    Note  that  U(y),  in  addition 
to^n  in  the  denomination,  is  flat  at  7  -  0.    K-snce,  as  the  interval  of 
integration  approaches  zero,  ^U(y)  is 


The  integral  \        is  clearly  bounded  bye     1  ^  K  where 


-1/6 

K  is  the  total  variation  of  U(y).  Since  this  latter  is  finite,  and  in 
fact  even  approaches  zero  as  n  increases-  the  term  in  question  is  cer- 
tainly o(-h. 


Finally,  the  last  of  the  three  integrals  is  clearly  bounded  by — 

vn 


and  consequently  is  o  (~)  •    Thus  we  conclude  that 


oo  Mn 


0 

and,  hence,  that  as  n-^co  the  tail  of  the  original  distribution  with 
s  ^0  has  the  following  asymptotic  formula 

i     «  /-..,..  nW       1  nfi(s)  -  sn'»(s)) 

The  analysis  for  the  case  of  a  lattice  distribution  is  quite  similar 
but  involves  another  term.   We  use  the  theorem  of  Esseen  (Gnedenko  and 
Kolmogoroff ,  p.  213)  which  may  be  phrased  for  our  purposes  as  follows. 
For  any  £  ?  0  there  exists  nQ  such  that  when  n  >  nQ  we  have 

■£(y)     -Y^T    6a2T5"        -V^"     "  ^ 

with  B(y)<— .    In  this  formula  o   is  the  second  moment  and  *L  the  third 
moment  of  the  H  distribution ,    Also,  h  is  the  maximum  span,  that  is,  the 
largest  distance  such  that  all  jumps  of  the  H  distribution  occur  at  mul- 
tiples of  this  from  each  other.  A  is  the  position  of  the  first  jump  in  the 
H  distribution  in  the  positive  direction.    Finally,  S(Z)«JV)  -  Z  +  that 
is  a  saw-tooth  function  which  jumps  vertically  from  -  ^  to  ♦  j  at  the 
integer  values  of  Z  and  decreases  linearly  with  slope  -  1  between  the 
integers . 


r  a) — Ti 

To  estimate  »  .^Yny," 


|"  e-s^njj.« «    y  ^(y^  we  0Dse?.ve  first  that  three  of 

the  terms  are  identical  with  those  involved  in  the  non-lattice  case  and 
consequently  the  integral  with  respect  to  these  functions  is  xsjniptotic 


"to  -■'   °    The  only  term  to  be  evaluated  is  that  involving  the  S 

s  Y2TTnu.5 '  (s) 

function.    This  can  be  written  as  the  sum  of  two  integrals  on  talcing  the 
'differential  of  the  product 

f-s€^7'     y  _h_  s  de~y2/2  +   C^l^77" y__h_   e-y2/2  ^ 
Vmi1 «  J  /V2nnu" 

The  first  integral  is  «f~) .    This  can  be  seen  by  dividing  the  range  of 

-1/6^°  -=1/6 

integration,  0  to  n    '     and  n        to  oo,  as  before.    The  argument  is  essen- 
tially the  same.    In  the  first  interval,  the  integral  is  small  because  of 
the  flatness  of  e'^2^2  and  because  of  the-fn"  in  the  denominator.    In  the 


second  interval,  the  term  e~s/W"    7  forces  the  integral  to  be  small.  ?he 
second  integral  above,  integrating  on  dS,  can  be  divided  into  an  infinite 
sum  for  the  jump  points  of  S  and  an  integral  d((y         oYn)/h)for  the 
sloping  parts  of  So    The  infinite  sum  is 

e-y-i2/2 


^  2nnp.'  • 

where  the  summation  is  over  the  y^^  which  make  the  argument  of  the  S  function 

an  integer:   

(y±        )^n  n't 


K 
h 

where  K  is  an  integer,  or 

hK  +  A 
y.  -  — 

Thus  the  sum  becomes 


To  estimate  this  sum,  we  use  again  the  device  of  dividing  the  range  of 
summation  into  a  part  from  0  tb  n"^  and  a  part  from  n        to  oo.  In 
the  first  of  these  sums,  the  exponential  with  the  squared  exponent  approaches 
the  constant  1  for  all  K  in  the  range,  and  the  sum  reduces  essentially  to 
a  geometric  series .  Asymptotically,  then,  the  sum  becomes 

h         -s  £  1  

6 


^nTT^  1  -  e~hs 


We  have  still  one  further  term  to  estimate,  namely 
oo 


-J 


~s  W' '    y       -y2/2  hlT^T"  V~n"  dy 

~h 


e      '  ^      "  e 

0 

oo 


1 1 


-f 


e-siRTT   7  di(y). 


o 

This  term  is  exactly  equal  to  the  original  :d  j£(y)  integral  and  opposite 
in  sign  (since  the  saw-tooth  slopes  are  in  the  negative  direction).  These 
two  terms  therefore  cancel  each  other,  and  we  are  left  with  only  one  term 
of  order  — .    -he  final  answer,  then,  is  that  asymptotically,  with  large 


n, 


00    -As 


[ 


1  —  e 


This  completes  the  proof  of  the  theorem. 

h    "  ^S 

It  may  be  noted  that  if  the  coefficient  - — ^   be  expanded  in  a 

1/         h     1  ~  e~  \ 
power  series,  the  first  terms  are  -£L  ♦  s(^  -A)  -  •••}•    Hence,  as  h—?0 

(and  also,  therefore,  A  ~2>0)  the  lattice  result  approaches  the  non-lfittice 

result,  as  is  to  be  expected.    It  may  also  be  seen  that  with  A- the 

lattice  coefficient  is  a  particularly  close  approximation  to  the  non- lattice 

coefficient  since  the  quantity  ^        then  vanishes.    Indeed,  in  this  case, 

the  coefficient  becomes  i  (1  -  ^  (h  s)  +...). 

S  cu 


Generalized  Chebycheff  and  Chernoff  Inequalities 


Suppose  we  have  a  random  vector  (x-^,  Xg,  ...3  xk).    Let  0  (u^  vl,,  u.^,) 
be  everywhere  non-negative  and  monotone  increasing  in  all  the  u^,  and 
assume  F  j^C^,  Xg,  x^^J  exists. 

Prob  fx^  t±  (i-1,  2,  k)J^-      "1*  2' 


E^fx^,  x2,  xk)3 
0  (t^  tg,...,  tk) 


(1) 


If  we  choose  for  0  the  function 

s    u,  +  s„u-  +  ...  +  3.XL 
[f  ug,  ..,.uk)  -  e  X    1       Z^  (alls^O) 


and  let  u^,  s2,  «..*,  sk)  -  log  E(0)  ■  log  (moment  generating  function  of 

the  distribution),  then  (1)  becomes 

-  \i(s  ,  sp,  s.)  -2Z  s.t. 

Prob  |x.  ;>  t.(i=l,  2,  k)J  ^  e     1     ^  11  (2) 

This  bound  is  minimized  by  choosing  the  s.^  to  satisfy 

|f-  -  t±  (i-1,  2,         k)  (3) 

If  the  random  vectors  are  the  sum  of  n  independent  random  vectors,  each 
with  the  same  distribution,  then  the  \t*  (s^,  say  for  the  sum  vector,  is 
nu.^)  where  n^)  is  the  log  (moment  generating  function)  for  the  individual 
random  vectors.    The  above  result  may  then  be  translated 

Prob  [x*^  nt±  (i-1,  2,  k)J^e  L    *     *  1  tl 

with  the  best  choice  of  s . ,  those  which  satisfy        ■  t .  „ 

1  $s.  i 


Channels  with  Side  Information  at  the  Transmitter 

Claude  E .  Shannon 
(1) 

Channels  with  feedback        from  the  receiving  to  the  transmitting 
point  are  a  special  case  of  a  situation  in  which  there  is  additional  informa- 
tion available  at  the  transmitter  which  may  be  used  as  an  aid  in  the  forward 
transmission  system.     In  Fig.  1  the  channel  has  an  input  x  and  an  output  y. 


i 

encoder  — 

H>  

L  & . 

channel 

U->— 

Fig.  1 

There  is  a  second  output  from  the  channel, u,  available  at  the  transmitting 
point,  which  may  be  used  in  the  coding  process.    Thus  the  encoder  has  as 
inputs  the  message  to  be  transmitted,  m,  and  the  side  information  u.  The 
sequence  of  input  letters x  to  the  channel  will  be  a  function  of  the 
available  part  (that  is,  the  past  up  to  the  current  time)  of  these  signals. 

The  signal  u  might  be  the  received  signal  y,  it  might  be  a  noisy 
version  of  this  signal,  or  it  might  not  relate  to  y  but  be  statistically 
correlated  with  the  general  state  of  the  channel.    As  a  practical  example, 
a  transmitting  station  might  have  available  a  receiver  for  testing  the  current 
noise  conditions  at  different  frequencies.    These  results  would  be  used  to 
choose  the  frequency  for  transmission. 

A  simple  discrete  channel  with  side  information  is  shown  in  Fig.  2 


mod  2 


j  random 
(0, 1  device 





Fig.  2 

In  this  channel,  x  y  and  u  are  all  binary  variables;  they  can  be  either 
zero  or  one.    The  channel  can  be  used  once  each  second.    Immediately  after 


it  is  used  the  random    device  chooses  a  zero  or  one  independently  of 
previous  choices  and  with  probabilities  1/2,  1/2.    This  value  of  u  then 
appears  at  the  transmitting  point.    The  next  x  that  is  sent  is  added 
in  the  channel  modulo  2  to  this  value  of  u  to  give  the  received  y.  If 
the  u  side  information  were  not  available  at  the  transmitter,  the  channel 
would  be  that  of  Fig.  3, 


1/2  0 
Fig.  3 

a  channel  with  capacity  zero.    However,  with  the  side  information  available, 
it  is  possible  to  send  one  bit  per  second  through  the  channel.    The  u 
information  is  used  to  compensate  for  the  noise  inside  by  a  preliminary 
reversal  of  zero  and  one,  as  in  Fig.  4. 


u 


Fig.  4 

Without  studying  the  problem  of  side  information  in  its  fullest 

generality,  which  would  involve  possible  historical  effects  in  the  channel, 

possibly  infinite  input  and  output  alphabets,  etc.,  we  shall  consider  a 

moderately  general  case  for  which  a  simple  solution  has  been  found.  See 

(2) 

also  in  this  connection  Silverman 

The  memoryless  discrete  channel  with  side  state  information. 

We  consider  a  channel  which  has  a  finite  number  of  possible  states, 
s,,  s»,   ...  ,  s„.    At  each  use  of  the  channel  a  new  state  is  chosen, 

i'      L'  g 

probability  |t  for  state  8t.    This  choice  is  statistically  independent 

of  previous  states  and  previous  input  or  output  letters  in  the  channel. 

The  state  is  available  as  side  information  u  at  the  transmitting  point. 

When  in  state  s£  the  channel  acts  like  a  particular  discrete  channel  Kt. 

Thus,  its  operation  is  defined  by  a  set  of  transition  probabilities 

Pti(j),     t  -  1,  2,   ...  ,  g,  i  -  1,  2,   ...  ,  a,  j  -  1,  2,   ...  ,  b,  where 

a  is  the  number  of  input  letters  and  b  the  number  of  output  letters.  Thus, 

abstractly,  the  channel  is  described  by  the  set  of  state  probabilities  <gt 

and  transition  probabilities  pti(j),  with  qt  the  probability  of  state  t 

and  Pti(j)  the  probability  if  in  state  t  and  i  is  transmitted,  that  j  will  b 

received. 


4 

A  block  code  with  M  messages  (the  integers  1,  2,   ...  ,  M)  nay  be  defined 

as  follows  for  such  a  channel  with  side  information.    This  definition,  incident- 
CD 

ally,  is  analogous  to  that  for  a  channel  with  feedback  given  previously 
If  n  is  the  block  length  of  the  code,  there  are  n  functions 
f1(m;u1),  f2(m;u1,  u2),  f3(m;u1,  u2,  u3),  ...  ,  fn(«;ulf  u2,   ...  ,  un) . 
In  these  functions  m  ranges  over  the  set  of  possible  messages.  Thus 
m  ■  1,  2,  ...  ,  M.    The  u±  all  range  over  the  possible  side  information 
alphabet.     In  the  particular  case  here  each  u£  »  1,  2,   ...  ,  g.  Each 
function       takes  values  in  the  alphabet  of  input  letters  x  of  the  channel. 
The  value  ft  (m;u^,  u2,  ...  ,  u^  is  the  input  xt  to  be  used  in  the  code  if 
the  message  is  m  and  the  side  information  up  to  the  time  corresponding  to  i 
consisted  of  u(  ,  u^,   ...  ,  u^..    This  is  the  mathematical  equivalent  of  saying 
that  a  code,  consists  of  a  way  of  determining,  for  each  message  m  and  each 
history  of  side  information  up  to  the  present,  the  next  transmitted  letter. 
The  important  feature  here  is  that  only  the  data  available  at  the  time  i, 
namely  nj  uj,  u2,   ...  ,  u^,  may  be  used  in  deciding  the  next  transmitted 
letter  x^,  not  the  side  information  uj^,   ...  ,        yet  to  appear. 

A  decoding  system  for  such  a  code  consists  of  a  mapping  or  function 
h(yi>  yz>  --^yn)  of  received  blocks  of  length  n  into  messages  m;    thus  h 
takes  values  from  1  to  M.     It  is  a  way  of  deciding  on  a  transmitted  message 
given  the  received  block  y1?  y2,   ...  ,  yn.    For  a  given  set  of  probabilities 
of  the  messages,  there  will  exist,  for  a  given  channel  and  coding  and 
decoding  system,  a  calculable  probability  of  error  Pe;    the  probability  of 
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a  message  being  encoded  and  received  in  such  a  way  that  the  function  h 
leads  to  deciding  on  a  different  message.    We  shall  be  concerned  parti- 
cularly with  cases  where  the  messages  are  equiprobable,  each  having  proba- 
bility   I.      The  rate  for  such  a  code  is  -  log  M.    We  are  interested  in 
M 

the  channel  capacity  C,  that  is  the  largest  rate  R  such  that  it  is 
possible  to  construct  codes  arbitrarily  close  to  rate  R  and  with  pro- 
bability of  error  Pe  arbitrarily  small. 

It  may  be  noted  that  if  the  state  information  were  not  available  at 
the  transmitting  point,  the  channel  would  act  like  a  memoryless  channel 
with  transition  probabilities  given  by 

P;u)  -  i  qtptiu>    fa  „    ;      ;  ■"-;.) 

t  <>  ■ 

Thus,  the  capacity  C,  under  this  condition  could  be  calculated  by  the 

ordinary  means  for  memoryless  channels.     On  the  other  hand,  if  the  state 

information  were  available  both    at  transmitting  and  receiving  points,  it 

is  easily  shown  that  the  capacity  is  then  given  by  C2  -  L  qtCt  where  Ct 

t 

is  the  capacity  of  the  memoryless  channel  with  transmission  probabilities 
p     (j).     The  situation  we  are  interested  in  here  is  intermediate  --  the 
state  information  is  available  at  the  transmitting  point  but  not  at  the 
receiving  point. 


Theorem.      The  capacity  of  a  memoryless  discrete  channel  K  with  side 
state  information,  defined  by  qt  and  Pti(j),  is  equal  to  the  capacity  of 


the  memoryless  channel  K*  (without  side  information)  with  the  same  output 
alphabet  and  an  input  alphabet  with  a&  input  letters    X  -  (Xj,  x^,  .  ••  ,xg) 
where  each  x^-»  1,  2,  a.    The  transition  probabilities  *^(y)  for  the 

channel  k'  are  given  by 


r  (y)  -  r  (y)  -  2  qtPtxt(y>- 

X  Xj_,  x2,   ...  ,  xg  t 


Any  code  and  decoding  system  for  K    can  be  translated  into  an  equivalent 
code  and  decoding  system  for  K  with  the  same  probability  of  error.  Any 
code  for  K  has  an  equivocation  of  message  (conditional  entropy  per  letter 
of  the  message  given  the  received  sequence)  at  least  R  -  C,  where  C  is  the 
capacity  of  k' .    Thus  any  code  with  rate  R  >  C  has  a  probability  of  error 
bounded  away  from  zero  (independent  of  the  block  length  n) 


P.  > 


R  -  C 


6(R  +  I  In  5:) 


It  may  be  noted  that  this  theorem  reduces  the  analysis  of  the  given 
channel  K  with  side  information  to  a  memoryless  channel  K*  with  more  input 
letters  but  without  side  information.    One  uses  known  methods  to  determine 
the  capacity  of  this  derived  channel  and  this  gives  the  capacity  of  the 
original  channel.    Furthermore,  codes  for  the  derived  channel  may  be 
translated  into  codes  for  the  original  channel  with  identical  probability 
of  error.     (Indeed,  all  statistical  properties  of  the  codes  are  identical.) 

We  first  show  how  codes  for  k'  may  be  translated  into  codes  for  K.    A  code 
word  for  the  derived  channel  K*  consists  of  a  sequence  of  n  letters  X  from 
the  X  input  alphabet  of  K1 .    A  particular  input  letter  X  of  this  channel 


may  be  recognized  as  a  particular  function  from  the  state  alphabet  to  the 
input  alphabet  x  of  channel  K.    The  full  possible  alphabet  of  X  consists 
of  the  full  set  of  a&  different  possible  functions  from  the  state  alphabet 
with  g  values  to  the  input  vallue  with  a  values.     Thus,  each  letter 
X  ■  (xp  X£,  ,  Xg)  of  a  code  word  for  K*  may  be  interpreted  as  a 

function  from  state  u  to  input  alphabet  x.     The  translation  of  codes 
consists  merely  of  using  the  input  x  given  by  this  function  of  the  state 
variable.    Thus  if  the  state  variable  u  has  the  value  1,  then  x^  is  used 
in  channel  K;     if  it  were  state  k,  then  x^.     In  other  words,  the  translation 
is  a  simple  letter  by  letter  translation  without  memory  effects  depeading 
on  previous  states. 

i 

The  codes  for  K    are  really  just  another  way  of  describing  certain  of 
the  codes  for  K  --  namely  those  where  the  next  input  letter  x  is  a  function 
only  of  the  message  m  and  the  current  state  u,  and  does  not  depend  on  the 
previous  states. 

It  might  be  pointed  out  also  that  a  symple  physical  device  could  be 
constructed  which,  placed  ahead  of  the  channel  K,  makes  it  look  like  k'  . 
This  device  would  have  the  X  alphabet  for  one  input  and  the  state  alphabet 
for  another  (this  input  connected  to  the  u  line  of  Fig.   1).     Its  output 
would  range  over  the  x  alphabet  and  be  connected  to  the  x  line  of  Fig.  1. 
Its  operation  would  be  to  give  an  x  output  corresponding  to  the  X  function 
of  the  state  u.     It  is  clear  that  the  statitistical  situations  for  K  and  k' 
with  the  translated  code  are  identical.     The  probability  of  an  input  word 
for  k'  being  received  as  a  particular  output  word  is  the  same  as  that  for 
the  corresponding  operation  with  K.     This  gives  the  first  part  of  the  theorem. 
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To  prove  the  second  part  of  the  theorem,  we  will  show  that  in  the 
channel  K,  the  change  in  conditional  entropy  (equivocation)  of  the  message 
m  at  the  receiving  point  when  a  letter  is  received  cannot  exceed  C  (the 


be  the  next  input  letter,  output  letter  and  state  letter.  Let  U  be  the  past 
sequence  of  u  states  from  the  beginning  of  the  block  code  to  the  present 


current  y.    We  are  assuming  here  a  given  block  code  for  encoding  messages. 
The  messages  are  chosen  from  a  set  with  certain  probabilities  (not  necessarily 
equal).    Given  the  statistics  of  the  message  source,  the  coding  system,  and 
the  statistics  of  the  channel,  these  various  entities  m,  x,  y,  U,  Y  all 
belong  to  a  probability  space  and  the  various  probabilities  involved  in 
the  following  calculation  are  meaningful.    Thus  the  equivocation  of  message 
when  Y  has  been  received,  H(m|Y),  is  given  by 


capacity  of  the  channel  k').     In  Fig.  1,  we  let  m  be  the  message;    x,  y,  u 


H(m | Y)  =  -     2    P(m,Y)  log  P(m|Y) 
m,Y 


(The  symbol    <^Q^>   here  and  later  means  the  average  of  G  over  the 


probability  space.)     The  change  in  equivocation  when  the  next  letter 


y  is  received  by 


H(m | Y)  -  H(mjY,y)  =  -  <log  P(m|Y)>  +  <log  P(m|Y,y)> 
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m  A       P(mtTty)P(T)  ) 


-  mm  -  4-  m^r> 

^  /  P(yimY)\ 
H(m|Y)  -  H(m|Y,y)  -  <log    p(y)    /  (1) 


The 


<p(Y  v)  \ 
log   p(  y)p(y )  /  an 


average  mutual  information  and  therefore  non-negative.     Now  note 
that  by  the  independency  requirements  of  our  original  system 

P(y|xj  =  P(y|X/m  ,u  ^U)  =  PCyjx^^UjU  ,Y)  ' 

Now  since  x  is  a  strict  function  of  m,  u,  and  0  (by  the  coding 
system  function)  we  may  omit  this  in  the  conditioning  variables 

P(y|m  u  U)  =  P  (y|m  u  U  Y) 

P(y,m,u,U)  P(y,m.u,U,Y) 
P(m,u,U)      "  P(m,u,U,Y) 

Since  the  new  state  u  is  independent  of  the  past  P(m,u,U)  =  P(u)P(m,y) 
and  P(m,u,U,Y)  =  P(u)  P(m,U,Y).     Substituting  and  simplifying 

P(y,u|m,U)   =  P(y,u|m(U  ,Y) 

Summing  on  u  gives 

P(y|m;D)  =  P(y|m(U;Y) 

Hence: 
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H(y|m,U)  =  H(y|m,U,Y)  <  H(y|m,Y) 
-  (log  P(y|m,U)><    -  <log  P(y\m,Y)} 

Using  this  in  (1) 

H(m | Y)  -  H(m|Y,y)  <    (log  P(^yjU)  >  (2) 

We  now  wish  to  show  that  P(y|m,U)  =  P(y|X).     Here  X  is  a  random 
variable  specifying  the  function  from  u  to  x  imposed  by  the  encoding 
operation  for  the  next  input  x  to  the  channel.     Equivalently ,   X  corres- 
ponds to  an  input  letter  in  the  derived  channel  K'.     We  have 
P(y|x,u)  =  P(y |x,u,m,U) .     Furthermore,   the  coding  system  used  implies 
a  functional  relation  for  determining  the  next  input  letter  x,  given 
m,  U  and  u.        Thus  x  =  f(m,D,u).     If  f (m,U,u)  =  f(m',U',  u)  for  two 
particular  pairs  (m,U)  and  (m1,  u')  but  for  all  u,  then  it  follows 
that  P(y|m,U,u)  =  P(y|m',  u',  u)  for  all  u and  y ;     since  m,  U  and  u 
lead  to  the  same  x  as  m',  u',  and  u.     From  this  we  obtain 

P(y|n,U)  =  2  P(u}?(yj0,U,u1  =  2  P(u)P(y |m' ,U ' ,u)  =  P(y|m',  U*). 

u  u 
In  other  words,   (m,U)  pairs  which  give  the  same  function  f(m,U,u) 

bive  the  same  value  of  p(yjm,U)   cr,   said  another  way,  P(y|m,U)  =  P(y(X), 

Returning  now  to  our  inequality  (2),  we  have 


H(m|Y)  -  H(m|Y,y)  <  ^ogSU^l^ 


PCX) 

H(;a|Y)  -  H(m|Y,y)  <  C. 


<    M   <loE  > 
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equivocation 

This  is  the  desired  inequality  on  the  equivocation.   The/cannot  be 

reduced  by  more  than  C,   the  capacity  of  the  derived  channel  K'  for 

each  received  letter.     In  particular  in  a  block  code  with  M  equiprobable 

messages,  R  =  —  log  M,  If  R  >  C,   then  at  the  end  of  the  block  the 
n 

equivocation  must  still  be  at  least  nR  -  nC,   since  it  starts  at 
nR  and  can  only  reduce  at  most  C  for  each  of  the  n  letters. 

It  is  known  that  if  the  equivocation  per  letter  is  at  least  R  -  C 
then  the  probability  of  error  in  decoding  is  at  least 

P    >   SjUS   '      'I  / 

■   6     6(R  +  ±iog|)  *y 

Thus  the  probability  of  error  isjfounded  away  from  zero  regardless  of 
the  block  length  yi ,  if  the  code  attempts  to  send  at  a  rate  R  >  C. 
This  concludes  the  proof  of  the  theorem. 

■ 

As  an  example  of  this  theorem,  consider  a  channel  with  two  output 

letters,  any  number  a  of  input  letters  and  any  number  g  of  states.  Then 

the  derived  channel  K*  has  two  output  letters  and  a6    input  letters. 

HoweVer,    in  a  channel  with  just  two  output  letters,  only  two  of  the 

input  letters  need  be  used  to  achieve  channel  capacity,  as  shown  in  (3). 

Namely,  we  should  use  in  k'  oaly  the  two  letters  with  maximum  and  minimum 

transition  probabilities  to  one  of  the  output  letters.     These  two  may  be 

found  as  follows.     The  transition  probabilities  for  a  particular  letter  of 
i 

are  averages  of  the  corresponding  transitions  for  a  set  of  letters  for 
K,  one  for  each  state.     To  maximize  the  transition  probability  to  one  of 
the  output  letters,  it  is  clear  that  we  should  choose  in  each  state  the 
letter  with  the  maximum  transition  to  that  output  letter.     Similarly,  to 
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minimize,  one  chooses  In  each  state  the  letter  with  the  minimum  transi- 
tion probability  to  that  letter.    These  two  resulting  letters  in  k'  are 
the  only  ones  used,  and  the  corresponding  channel  gives  the  desired 
channel  capacity.     Formally,  then,  if  the  given  channel  has  probabilities 
pti(l)  in  state  t  for  input  letter  i  to  output  letter  1,  and 

Pti(2)  -  1  -  Pti(l)  to  the  other  output  letter  2,  we  calculate; 

.._ 

Pi   -  £  qtBax  Pti*1) 
t  i 

p.    -  L  qtnin  Pti(l) 
L       t  i 

The  channel  k'  with  two    input  letters  having  transition  probabilities 
Pi  and  1  -  p^  and         1  "  P2  to  t*ie  two  output  letters  respectively,  has 
the  channel  capacity  of  the  original  channel  K. 

Another  example,  with  three  output  letters,  two  input  letters  and 
three  states,  is  the  following.    The  probability  matrices  for  the  three 
states  are:   (the  states  assumed  to  each  have  probability  1/3) 

State  1  State  2  State  3 

100  010  001  :V 

0      1/2    1/2  1/2    0    1/2  1/2  1/2  0 

In  this  case  there  are  23  -  8  input  letters  in  the  derived  channel  K* . 
The  matrix  of  these  is  as  follows: 


13 


1/2 

1/2 

0 

0 

1/2 

1/2 

1/2 

0 

1/2 

2/3 

1/6 

1/6 

1/6 

2/3 

1/6 

1/6 

1/6 

2/3 

1/3 

1/3 

1/3 

1/3 

1/3 

1/3 

If  there  are  only  three  output  letters  one  need  use  only  three  input 
letters  to  achieve  channel  capacity,  and  in  this  case  it  is  readily  shown 
that  the  first  three  can  (and  in  fact  must)  be  used.    Due  to  the  symmetry, 
these  three  letters  must  be  used  with  equal  probability  and  the  resulting 
channel  capacity  is  log  3/2. 

In  the  original  channel,  it  is  easily  seen  that,  if  the  state 
information  were  not  available,  the  channel  would  act  like  one  with  the 
transition  matrix 

1/3  1/3  1/3 

1/3  1/3  1/3 

This  channel  clearly  has  zero  capacity.     On  the  other  hand,  if  the  state 
information  were  available  at  the  receiving  point  or  at  both  the 
receiving  point  and  the  transmitting  point,  the  two  input  letters  can 
be  perfectly  distinguished  and  the  channel  capacity  is  log  2. 


Some  Miscellaneous  Results  in  Coding  Theory 


Claude  E.  Shannon 


This  paper  contains  a  atsmbsr  of  sorr.ewliat  miacsllaneotts  results?  centered 
chiefly  on  the  problem  of  coding  sources  Siii©  noiseless  channels,  lnclnc hr?  • 
cassa  '.-here  the. channel  s/aihals  have  different  durations  or  costs. 

gjre  mnaher  °f  sequences  of  a  given  length 

Svyrcs?  a  na~aer  cf  letters,  are  available:  whose  lengths  (or  durations) 
are  a..,  a.,,  •  •  •»  a    ai*c*  ,'73         a  bound  on  the  number  of  sequence;.',  of  total 

Here  it  is  assuraed  that  any  sequence  of  letters  is  allowed.  17e 
achlne  ?•!(£}  to  be  the  number  of  different  sequences  whose  total  length  is 
greater  than  t  -  cmin  but  not  greater  then  £.   Here  is  the  smallest  a. . 

Ta.as  ?!{;!}  might  be  thoujJ.it  of  as  the  number  of  sequences  of  length  £  where 
we  allow  filling  cut  with  a  blank  to  an  e;;tent  up  to  the  shortest  letter.  This 
.■r"  ;i:;;.o-i  .niches  I;(f }  better  behaved  (e.g.,  it  is  now  monotone  increasing/ 
than  if  we  court  only  sequences  of  eroaeily  length  £. 

rUJ  satisfies  the  difference  equation 

Nf£)  =  I-I(f  -  a,)  r  ITU  -  a2)  *  . . .  v  N(£  *  l.J         £  >  o 

as  we  see  by  acting  that  each  sequence  cf  length  £  must  end  in  one  or  another 
of  the  available  letters.   Furthermore,  .the  boundary  conditions  may  be  taken 


to  he  Ml)  «  0  for  P.  <  0  and  N(£)  «  1  for  0«£<a 

*  '  *  *  *&in 

Associated  with  the  difference  equation  is  the  folIoT/iag  characteristic 
equation: 

-a,  -a 
1«S       +  X    "i  ...  i  S 

Sine 2  all  the  a,  are  positive  and  real*  the  right-hand  mernber  is  a  strictly 
monotone  decreasing  function  of  K  and  varies  from  co  to  0  when  X  goes 
from  0  to  co.   Consequently,  the  char  act  eristic  equation  has  a  unique  posi- 
tive real  root  W. 

£-a 


Theorem:  For  all  4.  1T{£ )  «  For  all  £  »  0.  3>  T/"    raa3  , 

This  will  be  proved  by  a  kind  of  Induction  on  increasing  intervals  of  i, 

each  interval  of  length  a    .  .   Consider  first  the  upper  bound  w*.  This 

•s  certainly  true  for  0     3.     a   «n#  since  in  this  range  If(jt)  -  1  and  W  >  I . 

.how  assume  the  upper  bound  true  out  to  some  iy   Then  for  t  in  the  range 

fl ,  s?  fi  <  5  2  t  a       v/e  nave 

*  i-T{/Z  -  a.)  *  IT'il  -  a?)  *  . . .  +  K(j>  -  a  J 
<S  W         +  *v*v       6  +  . , .  +  V7  * 

T.'hus  t':j  iheoreea  is  tfcsn  true  for  the  increased  interval  up  to  £  .  f  . 

It  folio  rs  that  the  bound  is  true  for  all  £. 

The  lower  bound  is  very  surlier.   It  is  certainly  true  for  0  «  £  «  a 

j2  -r  r'lsu 
since  if(.C)  ^  1  in  this  range  and  w      m?-:c  .<  il   The  inductive  step  gees 

through  as  before.   Assuring  that  for  0  <  &  (with  £,  5*  a       )  we  have 

~— a 

N(£)  ^  VI*  then  in  the  c::iendo&  range  from  £ ,  to  £,  *  a   ^  we  have 


H  ^  zasu 

Thus  by  extending  the  range  v/ith  steps  cf  a    .    v/e  obtain  the  result  for 
all  positive  £ . 

This  result, of  course,  relates  to  hoy/  rapidly  it  is  possible  to  approach 
the  capacity  of  a  noiseless  channel  v/ith  unequal  symbol  lengths.  Thus 
for  ft  ^  0.  from  this  theorem 


(log  w)  <  J.  log  N(£;j  =s  log  W 


The  approach  of  possible  signalling  rate  to.  the  capacity  log  W  is  rapid, 

a 

the  discrepancy  at  most  — ~~ . 

An  interesting  alternative  proof  that  K(£)  as        can  be  given  as 

fellows.  Assume,  in  contradiction,  that  for  some  £,  N(J2)  >  W£.  Then, 

since  IT(0)  «  Y.'°,  there  is  a  greatest  lower  bound  of  £«s,  say  for 

which  the  theorem  fails.   In  the  interval  £*  ^  JZ  *  ;a        there  must 

z  nun 

be  an  2,  say  iy  for  which  the  theorem  fails.   Subdivide  the  sequences 
ci  length  &  ,  into  subsets  according  to  the  first  letter.    Let  ths  fractional 

number  in  the  subset  beginning  v/ith  the  letter  i  be  f  (i  •»  1,  Z  g). 

Cheese  the  subset  for  which  aT1  log  f'71  is  a  minimum.   In  a  sense,  this 
menus  the  subset  which  conveys  the  least  information,  log  f?1,  per  unit 
time  in  its  first  letter.    The  minimum  value  of  aT1  log  fT1  among  the 
different  subsets  is  less  than  or  equal  to  log  W.   To  see  this,  suppose, 
in  contradiction,  that  for  all  i,  aT1  log  fT1  >  log  W.   Then  i.<  W  and, 
summing  on  i,   i  =  £,  f.  <  I,  W      *  1,  a  contradiction.   Hence  the  subset 


chosen  will  have  a!"1  log  f~A  «s  log  W,  or  f.  ^  W  *.  If  we  delete  the 
first  letter  frosn  all  sequences  in  this  subset,  we  are  left  with  a  set  of 


more  than  W  sequences  of  length  £,  -  a,.   Thus  ri{£,  -  a.)  >  W  . 

Since  £  -  -  a.  this  contradicts  the  assumption  that  £*  was  the 

greatest  lower  bound  of  £>'s  for  which  the  theorem  fails.   Eence  the 
theorem  is  true  for  all  j? . 


The  cese  with  unequal  letters  and  a  finite  est  of  constraints 
 ,  „  

A  more  general  problem  of  the  same  sort  relates  to  sequences  which 

are  subject  to  a  finite  state  set  of  constraints.   Thus,  suppose  there  are 

d  states  and  that  in  state  i,  letters  of  lengths  .1^..  are  permitted  leading 

to  state  j.   The  lades  a  ranges  o"er  the  uiuei-etii  letters  &oiag  iraa 

state  i  ',o  stite  j  and  j  ranges  over  the  different  states  which  can  follow 

siule        :.!o  7  Iet-N..(j2)  be  the  number  of  sequences  which  are  possible 

and  which  start  in  state  i,  end  in  state  j  and  are  of  length  These 

quantities  are  readily  seen  to  satisfy  the  difference  equations  • 

(i) 

Ths  corrospoatliag  characteristic  equations  are 
J     a,  1 

Let  V7  be  the  largest  real  root  (there  is  a  positive  real  root  as  shown  in 
the. append!::}  of  the  determinant  equation: 


Z  «• 

a  J 


=  0 


and  let  A.  be  a  corresponding  (positive)  solution  of  (S)-  :  ;  -  •  ~ 

i 

the  graph  of  the  constraints  has  complete  accessibility  so  it  is  £o*s\'S.z 
to  go  frosi-i  any  state  to  any  other.  'Than  ail  the  A.  are  p03it.lv {:iO:3 
vanish) . 

We  will  show  thai  the  nuraber  of  ssqttoaesa  of  length  £.  starting  in 
state  i  and  ending  in  j,  N.  ,.(£),  is  bounded  by 

A.  - 

k..(j»)  w~ 

This  is  certainly  true  for  £  <  0  and  also  ibr  £  »  0  since  thou  both  shies 
are  one  if  I  «  jt  and  otherwise  the  loft  side  is  verc  with  the  vr/d  pes5- 
tive.   We  now  proceed  by  the  inductive  type  pros 333  as  be 'ore,  assr-v.hr: 
the  in  equality  oat  to  scree  £,  and  then  show  it  follows  for  £  out  to  £  ,  plus 
the  minimum  £  ••• 


Thus  the  inductive  step  carries  the  inequality  up  to  I  ~  £  ,  +  rain  £  ...  and 
hence  it  is  true  for  all  £ . 
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An  explicit  cede  for  a  variable  length  alphabet 


It  is  possible  to  generalise  a  coding  process  v/e  have  described 
elser/here  for  a  binary  alphabet  to  the  case  where  there  are  a  number 
of  symbols  of  different  "durations"  or,  more  generally,  with  certain 
associated  "costs .«    It  is  desired  to  encode  a  finite  set  of  possible 
messages  with  associated  probabilities  p.,  p.„  ....  p..,  into  ssriusnce? 
of  letters  chosen  from  an  alphabet  where  the  letter  i  has  cost  or 
duration  £.  and  it  is  desired  in  the  cone  to  minimise  the  e::psotej 
cost.   This  problem  has  been  studiwc&u  a  thesis  by  Richard  .Unrc.-s. 

We  shall  use  in  our  analysis  a  curious  notation  for  real  numbers 
based  on  unequal  values  for  various  digits,   in  the  ordinary  decimal 
notation,  the  range  from  0  to  1  is  divided  into  ten  equal  intervals. 
These  are  labeled  with  the  digits  from  0  to  9.   Each  of  chess  iatsrvuls 
is  again  subdivided  eouallv  and  again  given  labels.   In  the  notation 
system  v/e  are  now  describing,  the  interval  is  subdivided  into  arbitrary 
sub-intervals  of  length  \Q.  \](  ....  J.        not  necessarily  equal  but 
with  JSk.  =  1.   If  a  real  number  between  0  and  1  falls  in  the  interval  \. 
(closed  on  the  left,  open  on  the  right)  its  first  digit  is  k.   All  of  the 
intervals  arc  subdivided  in  the  same  proportions  and  this  determines  the 
second  digit,  etc. 

This  notation  system  has  many  of  the  properties  of  ordinary  binary, 
ternary,  etc.,  systems  such  as  unicity  of  representation,  apart  from 
numbers  terminating  in  an  infinite  sequence  of  0fs  or  (n-l)'a.  However, 
it  does  differ  in  certain  important  respects.   For  enample,  if  a  real 
number  is  chosen  at  random,  then  in  an  ordinary  decimal  notation  we 
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expect  one -tenth  of  each  value  of  digit.  In  this  notation  we  expect  X.  of 
digit  i. 

Returning  now  to  the  coding  problem,  we  recall  that  if  a  set  of 
channel  letters  have  durations  £JP  ££,  ....  £Q  the  corresponding  channel 
capacity  is  C  *  log  WQ  where  WD  is  the  unique  positive  real  root  of 

i 

Given  a  set  of  £ .  and  the  corresponding  W_  we  define  a  subdivision  of 
i  ° 

the  unit  interval  and  a  corresponding  notation  for  real  numbers  by  the 
quantities 

v^.y^2  Vrwo"n 

Since  these  are  all  positive  and  their  sura  is  unity  they  form  a  satisfactory 
subdivision. 

Now  let  a  set  of  messages  have  probabilities  Pj  >  P2  *  •  •  •  *  Pm  and 

rot  V  p.  «  P,  .  so  P-  is  the  cumulative  probability  for  the  first  k  when 

i»  i  li 
the  messages  are  arranged  in  Order  of  decreasing  probability. 

The  code  to  be  used  is  defined  as  follows.   Let  P.  be  expanded  in 

-£. 

the  notation  defined  by  the  subdivision  W    1  out  to  just  enough  places 
to  make  the  uncertainty  due  to  "digits'-'  beyond  this  point  less  than  p,.. 
In  other  words,  if  Pk  is  represented  in  this  notation  system  by  the 
sequence  a^j,  ak2.  a^.  . . .    then  we  carry  out  the  expansion  for 
to  t  places  where  t  is  chosen  to  make 
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The  coda  we  are  defining  represents  message  r.  ky  ihs  sequence  a. 
channel  symbols  congas  ponding  to  the  t  digits  of  this  eraansian  ad  r .... 
It  should  first  be  noted  that  this  does  in  Tact  form  a  reversible  coda.  It 
satisfies  the  so-called  prefix  condition  -  .10  code  word  *'s  the  beginning 
of  any  other  code  word,   lade  3d  the  cods  word  corresponding  to  P, 
defines  an  interval  including  Pfc  and  of  width  .less  than  p,,.  This 
interval  consequently  does  not  include  P,r_.  or  any  earlier  P,  and  the 
code  word  must  differ  in  some  "digit"  from  all  preceding  code  words . 
Consequently  all  code  words  differ  and  any  sequence  of  code  words  is 
uniquely  decipherable. 

We  now  wish  to  estimate  the  expected  length  of  code  words,  thai:  is, 

£  Pt  £      .   From  (1)  we  have 
M  L  cki 

t  t~I 
log  W  •  £  £       >log  p;1  >  log  W  •   £  f 

i*i    ahi  i«l  cki 

Multiplying  by  p^  and  summing  over  all  k  gives 

loS  W  •  E  Pk  L.k  *  E  pk  log  p'1  •>  log  W  •  E  ?k(Lk  -  £maj 
li  K  n 

where  L,  *  F.  I       is  the  length  of  the  code  word  for  message  h  and,  on 

k      inl  Chi 

the  right,  we  have  underestimated  by  replacing  the  last  term  in  the  sum  on 

i  by  its  ciajdatim  possible  value,  fimaJ£«  the  largest  duration  of  any  letter. 

Now  recognising  that  E  p,  log  p"' 1     H,  the  entropy  or  average  information 

k     K  K 
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of  the  message  source,  and  using  L  ~  p^L^  to  represent  ths  expected 
length  of  a  code  word  this  may  be  written 

L!o2W5»H>iL- £mas)  log  W 

or 

lofw*L  *logV*£max 

This  is  our  desired  result.  Of  course  the  lower  bound  holds  for  any 
reversible  code.   The  upper  bound  shows  that  one  can  approximate  the 
ideal  lower  bound  to  within  J!       .   In  particular,  if  one  is  working  with 
.messages  which  consist  of  blocks  or  n-grams  of  tent,  then  H  becomes 
aEa  where  E    is  the  entropy  par  letter  for  bloebs  of  length  n.    a 3  u 
increases,  H    approaches  II,  the  entropy  par  letter  of  the  message 
source. 

Dividing  the  inequalities  by  n  we  have,  in  this  case, 
n       It  n     ,  max 

In  other  words,  the  average  code  length  oer  letter  of  message  has  a 
P. 

discrepancy  — at  most  from  its  ideal  value  on  the  basis  of  n-gram 
entropy.  This  is  closely  analogous  to  our  previous  result  with  channel 
letters  of  equal  duration. 

An  inequality  for  a  Huffman-type  code 

A  Huffman  code  (2)  for  cases  of  equal  cost  binary  symbols  is  optimal 
in  giving  the  minimum  expected  length  and  must  therefore  have  an 
expected  length  less  than  or  equal  to  H  +  i  since,  as  shown  above. 
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or  in  (!},  these  digit  expansion  codes  which  are  not  necessarily  optimal, 
satisfy  this  inequality.   Peter  Elias  suggested  the  desirability  of  a  direct 
proof  from  the  Huffman  procedure  of  this  upper  -round.   In  solving  this 
problem  a  slightly  stronger  result  was  obtained  as  follows. 

Theorem:  In  a  Huffman  binary  code,  the  expected  word  length  F 
satisfies  H  «F<H*  1  -  2pmia«  Wfcere  H  is  the  entropy  (in  Lit.:;)  of.  the 
set  of  probabilities  and  p *    is  the  smallest  probability  in  the  set. 

Proof:  The  lower  bound  is  of  course  well  hcown.   The  upper  Lo-.:nd 

will  be  proved  by  induction.   We  will  assuais  it  . true  for  all  codes 

corresponding  to  trees  with  a  -  1  branch  points  and  show  that  if  follows 

(Fig.  1) 

for  these  with  n  branchpoints.   Consider,  then,  a  Huffman  tree  v/ith 

n  branch  points  and  focus  attention  on  the  two  smallest  probabilities. 

These  occur,  by  the  method  of  construction,  at  ends  of  one  fork.  Let 

these  probabilities  be  p  and  q  with,  say,  p  <s  q.   If  we  delete  the  pq 

branches  leaving  P  *  p  +  q  at  the  junction,  we  have  left  a  Huffman  tree 

(because  of  the  method  cf  construction)  v/ith  a  -  1  junctions  and  to  which 

our  inductive  assumption  applies.   Let  uuprimed  letters  relate  to  this 

tree  and  primed  letters  to  the  enlarged  tree.   Then  we  have  d  *  o«  . 

•  -  rein 

(the  minimum  probability  for  the  enlarged  tree)  and  since  hot:?,  p  and  q 
are  less  then  or  equal  to  p^.^  (the  into  probability  for  the  smaller  fcrae), 
ir     -Pmln-  ---^0  the  average  code  lengths  £  and  ft*  and  the  entropy  H  and 
H'  are  clearly  related  as  follows: 

£i  =£  -£-P 

H*  -Si*  pk(|, 

Finally  by  inductive  assumption 
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Fig.  1 


Fig.  2 


11 


-  rmn 


Hence  £!  «Hf  1  -  2p  .   *  P 

«  E1  -  Pn(E,      *  I  -  2d  .   -:- 1 


since  P  «s  Xp^.   Nov/  by  the  convo:;ity  of  the  curve  I.T{::.  I-k)  we  he.vs, 
for  x         thai  H(x,  1-s)  ^  &s  (Fig.  2).   Esses,  recalling  thai  p  *S  q 
we  have 

r.-/p    cjA  .  9p 

If*  f/ 

PH||,  §.)  >£p 

Using  this  in  the  above  inequality  (2)  v;e  eoacIr.de 
£c      E'  f  1  -  2p 

Sines  p  S8  pj^  .  tins  completes  the  induction.  The  theorem  is  true 
for  one  branch  point,  probabilities  p  and  q  >  p,  since  in  this  ce.se 

1*1      H(p,  q)  +  I  -  2p 

using,  again,  the  fact  that  H{>;.  >  2i;. 

This  result  is  easily  generalised  to  the  ee.se  where  there  are  b 
available  (equal  length)  letters  in  the  alphabet.   In  this  case  it  can  be 
shown  that  5  «S  g*--  +  1  +  dp^^  where  d  is  the  number  of  branches 
or.  the  minimum  probability  branchpoint  of  the  tree.   Thus  d  is  the 
remainder  if  b  --  1  is  subtracted  from  n,  the  number  of  messages 
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enough  times  to  give  a  remainder  loss  tfeaa  or  equal  to  h.   The  proof 
of  this  result  is  by  the  obvious  gessn^izstioa  of  the  above  proof  civ.! 
is  left  to  the  reader. 


Append!:::  Existence  of  the  characteristic?  equation  root 

Lexnma:  Given  f..{w)  (5,  j  »  I,  2,  . . .  4  d)  continuous  f-une'dous  cf  « 
in  the  range  a  <  u  «  b  and  in  this  range  f,  .(*»)  >  0,  >  0, 

f^(a)        £^(b)  >  d,  then  there  euisis  W,  a  <s  W  «  bWi  as:!;  of 

X.  >  0,  T*SL  =s  1,  such  that 


lyw)-5ij|  =  o 


1  *3  ,7 


Proof:  Consider  the  {d^uimensional  region 


(Xj,  . . E&,  W).  v/here  ^  >  0,1* 


X..  =  1, 


i  U  v/hose  joints  are 


lhts  is  a 


topo.lo3j.Qai  image  of  a  schere  and  its  interior.  For  gJ*Sl^  W 


range  from  a  to  b,  coasider  the  continuous  .mapping 


E  x  f .  .(w) 

v      i    1  l3 
ij  iJ 


if  a     V,  ^  b 
a  if  V.  <  a 
b  if  Vj  >  b 


Note  thai  the  denominator  for  Y.  does  not  vanish  because  of  cur 

j 

assumption  that  £  fy(X)  >  0  and  hence  the  Yj  are  v/ell  defined.  Also 

the  Y.  are  non -negative  and  L  Y.  =  I.   Finally  a  «s  V  c;  b.  Hence 

3  3 
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this  maps  points  (X.,  W)  in  E  continuously  into  points  (y.,  V)  ia  R. 

Consequently,  by  the  Brcav/er  fixed  point  theorem  .there  exists  a 

point  (X.W)  which  is  mapped  into  itself,  that  is,  a  point  for  which 

£x.  f.  .(W)  =  X.  £  X.  f.  .(W),  W  a  V.   The  value  of  W  for  the  ftepoint 
I     *■  *J  J  ij  " 

clearly  is  not  a  or  b  since  these  points  are  moved  upward  or  down- 

ward  by  our  assumptions.    Hence  for  the  fixpoiut  we  have 

W  s  w  +  1  -  T.  f„(W)X,  or  £  f..(W)X.  =  1.   it  follows  that  for  the 
ij    ^  ij  3 

fixpoint 

ij   lj  13 
lyW)-  6..|  =0 

Let  the  elements  a. .  of  a  matrix  be  non-negative.  Suppose  there 
is  an  eigen  vector  Aj  all  of  whose  components  are  positive,  A.  >  0, 
and  the  corresponding  characteristic  value  ia  Xo-   We  will  show  that 
for  any  other  characteristic  value  K.  we  have  (xj  ^^0-   Let  B.  be 
a  characteristic  vector  for  \^  where  v/e  adjust  the  length  cf  this  vector 
as  follows.   Choose  its  length  in  such  a  way  that  A^  -  |bJ  ^  0  for  all 
i  and  the  equality  holds  for  at  least  one  i,  say  i  »  h,  so  that  A^  -  lBhl  • 
It  is  clear  that  this  can  be  done  since  with  aero  length  all  components  of 
are  less  than  those  of  A  and  increasing  continuously,  eventually  a 
first  one  of  the  |bJ  reaches  its  corresponding  A^   We  now  have 

£Aiaij  =  XoA.  (1) 

£  iBj^Hhl  lBjl  c») 
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Subtracting  these  equations  for  j  -  h 

All  terms  in  ths  sum  at  the  left  are  non-negative  and  also  A.  is 
definitely  positive.  It  follows  that  X  -  |\:|  >0. 


15 


Error  Probability  Bounds  for  rloisy  Channels 

This  paper  gives  a  simplified  proof  akin  to  that  published  previ- 
ously **3  but  leading  to  tighter  bounds  on  the  error  probability  avid  to 
a  simpler  final  result.   We  consider  a  discrete  memoryiess  channel 
defined  by  a  set  of  letter  transition  probabilities  p.(j)*   Assume  a 
;-"/?n  assignment  of  input  letter  probabilities  Pf.   These  night  be, 
but  not  necessarily,  the  sat  which  gives  channel  capacity.    Let  ■ 
Q.  «  T  F.o. (3)  be  the  output  letter  probabilities  that  would  result  if. 
the  P.  were  used  for  input  probabilities. 

We  consider,  ps  usnaL  a  random  code  ensemble  based  on  the  P. 
containing  M  *  evu"'  messages  each  with  code  word  of  length  p..   In  the 
ensemble  of  cedes,,  M  messages*  say  fefee  Integers  from  1  to  M,  are 
mappsd  independently  into  the  possible  input  words  of  length  a  for 
the  channel.   A  .message  is  mapped  into  a  code  word  with  probability 
ecual  to  that  of  the  code  word  produced  by  the  product  probabilities 
generated  by  the  P^.   Thus,  the  various  possible  codes  in  the  ensemble 
have  associated  probabilities  equal  to  the  probability  of  their  occurrence 

under  this  system.   We  wish  to  overbound  the  average  error  probability 

■ 

for  this  ensemble  of  codes  with  a  decoding  system  to  be  described, 
where  the  error  probabilities  of  individual  codes  are  weighted  with  the 
probabilities  associated  with  the  particular  codes. 

The  mutual  information  i(u;  v)  between  an  input  word  u  and  an  output 
word  v  is  given  by 


Pr,(v|u) 
I(u;v)  *  log   Pr*  1 

where  Pz-j  means  probability  calculated  by  the  riven  latter  ess*3sraaes:t 
?i  and  the  giver,  transitions  p^j),  (extended  independently  to  blocks  c? 
length  a).   I(u?  v)  may  be  thought  of  hare  as  a  number  associated  with 
any  input  word-output  word  pair.  X{«;v)  is  the  sum  of  the  mutual 
.  informations  between  corresponding  letters  of  u  and  v.   Thus  if  u 
consists  of  the  letters  Uj,  u2,  ,  *.»  aa  and  v  of  vja  v^,  .....  v^  thans 
because  of  the  independence  of  channel  and  letter  assignments,  we  have 

IXPr.(v.ju.)  *•>    #    I  \ 

,       lv       i'  x-r  (v.  u.) 

«usv)  -  log  *  -  E  log  -prhvT*" 

nPr.(v.)       i  ^rl*Ti' 

If  we  now  think  of  choosing  an  Anout  word  u  and^on  outout  word  v 
according  to  some  joint  probability,  then  X(u;  v)  becomes  a  random 
variable.   In  particular,  we  may  cheese  an  input  word  u  according  to 
the  product  probability  measure  obtained  from  the  probability  assign- 
ments P.,  and  then  an  output  word  v  according  to  the  transition  proba- 
bilities p^j),  (independently  applied  to  the  letters  of  u).   In  the  ensemble 
of  random  codesa  input  words  u  and  noisy  received  words  v  will  occur 
with  this  joint  probability  measure. 

We  define  a  decoding  system  for  codes  in  the  ensemble  as  follows. 
Any  received  word  v  is  decoded  as  that  message  in  the  cede  in  question 
whose  code  word  u  has  the  largest  I(u„  v).   (If  several  have  equal  values, 

2 


take  the  smallest  numbered  message  from  this  set.)  This  might  be 
called  maximum  information  decoding.  It  must  be  remembered,  however, 
that  mutual  information  is  here  calculated  by  the  original  probability 
assignments  produced  by  the         It  is  not  necessarily  maximum  informa- 
tion decoding  for  any  particular  code  or  word  in  a  code  in  the  ensemble. 
It  is  actually,  however,  equivalent  to  decoding  as  the  most  probable 
cause  of  the  received  word,  and  therefore  is  optimal  to  give  small  error 
probability.   This  is  because  all  messages  have  equal  a  priori  probability 

P(v|u  )  P(v|u-}  P(vju  )      P(v[u  } 

so  if  log  — pjyp  >  log      p(vj     W»n     p{v^  ■  >     p(v}    >   Hence  if 

message  cij  is  mapped  into  Uj  and  m2  into  u2  it  follows  from  their 

equal  prior  probability  that  P{m,  |v)  >  P(m2jv). 

V/e  may  also  define  a  second  joint  probability  measure  for  (u„v) 

pairs  as  follows.   Consider  choosing  a  u  word  according  to  the  assigned 

probabilities  and  a  v  word  independently  according  to  the  assigned 

probabilities.   This  joint  probability  measure  we  denote  by  Pi"2. 

We  may  also  consider  I(u,  v)  as  a  random  variable  with  this  set  of 

probabilities  Pr2(u,v)  for  (u„v)  pairs.   However,  a  peculiar  point 

arises  in  that  some  of  the  P(vju)  may  be  zero.   For  these  (u,v)  pairs, 

I(u,v)  =  log  F^v I u)  is  undefined.   (It  approaches  -eo  as  P(v|u)  approaches 

zero.)  This  caused  no  trouble  in  the  Pr.  probability  measure  since  these 
(u,v)  pairs  had  zero  probability  in  that  case.   Here,  however,  these  (u,v) 
pairs  may  have  positive  probability.   We  may  still,  however,  consider 
the  distribution  function  for  I(u,  v)  in  the  new  Pr2  measure.  Thus 
Px*2[l3»a]  means  the  sum  of  probabilities  of  all  (u,  v)  pairs  in  this  measure 


3 


for  which  I(uev)  is  defined  and  at  least  a.  In  other  words,  calculate 
the  distribution  function  as  though  there  were  a  certain  probability  of  J 
being  at  -co.   The  cumulative  distribution  function  from  the  left  would  start 
not  at  zero  but  at  a  positive  value  if  there  were  some  (u,  v)  pairs  with 

P(v|u)  m  0. 

Lemma:  For  any  a,  the  average  error  probability  Po  for  the 
described  ensemble  of  codes  is  bounded  by 

Pe  *  Pri[I<al  +  MPr^Ifca] 

£roof:  In  the  ensemble  of  codes,  input  words  and  received  versions 
of  those  occur  with  the  probability  measure  Pr^u.v).    Thus,  in  the 
lemma,  the  term  Pr^Ka]  can  be  identified  with  the  probability  of  a 
message  resulting  in  a  received  word  with  mutual  information  as  low  or 
lower  than  a  threshold  level  o. 

The  term  Prjlfcc]  may  be  interpreted  as  follows.   In  the  ensemble 
of  codes,  suppose  message  number  1  occurs.   This  will  give  rise  to 
various  received  vfs  with  probability  (over  the  ensemble)  given  by  Pr^v). 
This  is  because  message  1  is  mapped  into  all  possible  u's  with  the 
assigned  u  probabilities.   Now  consider  the  probability  that  message 
number  2  has  a  mutual  information  with  the  received  version  of  message  1 
greater  than  or  equal  to  a.   This  is  given  by  Pr„[I»o],  since  in  the 
ensemble  message  2  is  mapped  independently  into  the  u  space.   The  same 
applies,  of  course,  to  messages  3.  4,  ....  M  and.  in  fact,  for  all  mes- 
sages other  than  the  actual  cause  of  the  received  word.   The  probability 
that  any  message  (apart  from  the  actual  cause)  has  a  mutual  information 
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with  the  received  v  exceeding  a  is  given  exactly  by 

1  -  (l-Pr^X^a])^1"1  «(M-1)  Pr2[l^a] 

<MPr2fr*a] 

■ 

Thus  the  probability  that  either  the  actual  message  has  a  mutual  informa- 
tion less  than  a  or  that  some  other  message  has  a  greater  than  a 
mutual  information  with  the  received  v  is  bounded  by 

(The  probability  of  either  or  both  of  two  events  can  always  be  bounded  by 
the  sum  of  their  individual  probabilities,  whether  or  uot  the  events  are 
independent.)  If  neither  of  these  events  occurs,  the  decoding  will  be 
correct  since  the  mutual  information  with  the  actual  cause  is  greater  than 
a  and  that  with  all  other  messages  is  less  than  o  .    Thus  the  error  proba- 
bility in  the  ensemble  is  bounded  by 

P£  ^  Prj[l«a]  +  MPr2[l^a] 

As  an  example  of  a  random  code  ensemble,  consider  the  following 
situation.   Suppose  there  are  tv/o  input  words  and  two  output  words  with 
the  transition  probabilities  shown  in  Fig.  1. 
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FIG.  1 

The  arbitrary  assignments  of  probability  .  6  and  .  4  have  been  made  to 
Ul  ^d  U2«  and  this  resul{:s  in  .7  X  .6  +  .  5  X  .4  =  ,62  for  Q(Vj)  and 
.38  for  Q(V2).   Suppose  there  are  tv/o  messages,  1  and  2.   The  random 
ensemble  of  codes  then  consists  of  four  codes. 


code  1    Pr(code  1)  =  .6   *  .36 
coding  decoding 


code  ?.    Pr(code  2)  =  .  6  X  .  4  ~  .24 
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code  3    Pr(code  3)  *  ,6X.4=  .24 


1  -U. 


2  -U, 


Vj  ->  2 


code  4    Pr(code  4)  «  .4   =  .16 


1  -U. 


2  -  U, 


V2-1 


The  distribution  of  mutual  information  under  the  two  measures 
Prj  and  Pr2  is  given  by  the  following  table: 


Pr! 

Pr 

2 

Kbits) 

Dlvl 

.  6  X  .  7  =  .  42 

.6  X  .62  =  .372 

U1V2 

.18 

.228 

-.340 

U2V1 

"  .20 

.31 

-.308 

U2V2 

.20 

.19 

.397 

The  functions  pj  and  1  -  p2,  together  with  the  sum 
pj  +  (M-l)(l-p2)  =  Pj  f  1  -  p2  (since  M  =  2)  are  shown  in  Fig.  2. 
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i 

J           /'  ^ 

FIG.  2 


This  example,  of  course,  uses  a  ridiculously  smell  M  and  small 
number  of  input  and  output  words  in  order  to  keep  the  number  of  codes 
and  other  complexities  down.   According  to  the  theorem,  the  error 
probability  will  not  exceed  the  curve  px(x)  +  1  -  p2(x)  at  any  point.  The 
best  choice  of  x  is  clearly  one  between      34  and  .  177  for  which  the 
sum  curve  is  .942.    Thus  we  may  assert  ths?  for  the  random  ensemble 
Pfi  «  .942.   Actually,  if  the  messages  are  equally  likely,  the  error 
probability  is  given  exactly  by 

Pe  =  .36(.5)  +  .24(.4)  +  .24(.4)  i  .16(.5) 
»  .  452 

An  optimal  code  for  two  messages  into  this  channel  clearly  maps  them 
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into  the  two  input  words  and  gives  an  error  probability  with  optimal 
decoding  of  .4.   The  bound  of  the  theorem  does  not  become  very  useful 
or  significant  until  the  number  of  messages  and  possible  input  words 
is  reasonably  large. 

We  now  wish  to  express  the  bound  of  the  lemma  in  terms  of  the 
assigned  probabilities  ?i  and  the  transition  probabilities  p^j).  As 
noted  above,  I  is  the  sum  of  n  independent  random  variables  (the 
mutual  informations  between  corresponding  letters  cf  u  and  v).  This 
is  true  both  for  the  Pr^  and  Pr2  probability  measures.   Thus  each 
term  of  our  bound  relates  to  the  problem  of  estimating  the  tail  of  a 
distribution  which  is  the  sum  of  n  identically  distributed  random 
variables.   We  may  conveniently  estimate  such  tails  by  the  Chernofx 
bound  involving  the  logarithm  of  the  moment  generating  function,  say 
fj(s),  of  the  individual  random  variables.   Chernoff's  bound  states  that 
if  Xn  is  the  sum  of  n  such  random  variables,  then 

Pr[Xn  >  mx'(s)]  «  enMs)-s^'(s})         s  >  0 

In  our  case  the  log  moment  generating  function  for  Pr^,  which  will 
be  called  iij(s),  is  given  by 
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fi^s)  =  log  Y,  Pi?i®  e 
t,2 


s  log 


~  log  2^  pi 


Pi<J) 


With  regard  to  the  Pr2  measure  and  estimation  of  Prji^aj,  it  is 
still  possible  to  use  the  Chernoff  bound  for  s  >  0,  even  though  I  has 
"positive  probability  of  being  at  -co."    To  see  this,  note  that  the  meanest 
generating  function  vz(s)  for  the  Pr2  measure  is  a  well-defined  function 
for  s  >  0,  namely. 


Furthermore,  the  generalised  Chebycheff  inequality  with  the  monotone 

si 

increasing  function  e     still  holds  for  positive  s. 


s  >  0 


e5aPr(I*a]  <S[e5i] 


Pr[I>aj  < 


s  >  0 
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In  particular,  setting  a  =  fi^(s)  where  |jl2(s)  m  log  v2(s),  (this  is  the  best 
choice  to  give  a  good  bound),  we  obtain 

Ms)-S|i»(s) 

Pi^<s>3 « =  e  2  2 

Note  also  that 

v  pi(j)S 

a  Hjts-l)  0  <  S  ^  1 

Thus  the  two  generating  functions  have,  in  the  common  range  of  their 
validity,  a  very  simple  functional  relationship.   It  follows  that  |<£(s3  = 
ji'jCs-l).  If  we  wish,  in  using  the  Chernoff  bound,  to  place  the  cut-off 
point  for  the  tail,  that  is,  a,  at  the  same  point,  v/e  must  use  s^  and  s, 
for  jaj  and  \l2  related  by 

a  =  nii'^Sj)  t.  n^2(s2) 

This  is  achieved  by  making  s2  «  Sj+1,  since  then  n2(s2)  ■  p^(s.+  l)  ~ 
n'^Sj+l-l)  =  ^(Sj).   This  is  a  unique  solution,  if  we  except  the  rather 
degenerate  case  where  I  is  constant,  since  it  is  easily  shown  that  in 
all  other  cases  y*(s)  is  positive.   Using  s2  and  Sj  related  in  this 
manner  in  the  Chernoff  bounds,  we  have 


U 


Pr^ten^s)]  «  e     2    1         1  21 


nCjt^SjHSj+lJu^Sj)) 


s2  =      +  1  >  0 


Thus  both  bounds  are  now  expressed  in  terms  of       and  its  derivative 
with  one  parameter,  Sy  which  must  be  in  the  range  -1  <  Sj  <  0.  Our 
error  probability  P    is  now  bounded  by  (writing         =  M  and  s  for  s., 
since  we  no  longer  need  the  subscript) 


n^sj-sn^s})  ^  n(»1(»HM'l}|^(tr)' 
?  +  e  e 


Pe  <  e     x         1      +  e™*  e     k  1  ~1<  s  «  0 


This  bound  holds  for  any  s  in  the  allowed  range.  We  wish  to  choose  s 
to  roughly  minimize  the  bound.   This  is  done  conveniently  by  equating 
the  exponents  for  the  two  terms,  since  the  first  is  monotone  increasing 
in  the  range  (its  derivative  is  -nsfx|(s)}  while  the  other  is  monotone 
decreasing  (its  derivative  is  -n(s+  l)^|(s)).   Thus  we  set 

Hj(s)  -  su'jfs)  =  U  +  ^(s)  -  (s+1)  ^(s) 
With  s  chosen  to  satisfy  this,  the  two  terms  are  equal  and  therefore 
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P    reduces  to  twice  the  first  term.  Thus 
e 

if  Rs  ^(s)         -i  <  s  <  0 
nCfi^sJ-Si^Cs)) 

Pe  *  2e 

If  ,(-1)  <  E  .  AW  .  E  P.  PM  log#.  there  will  ^S  a  uni.ue  s 
in  the  allowed  range  satisfying  It  *  ^(s).   This  may  be  seen  by  noting 
that  fi.s,(s)  is  a  continuous  monotone  increasing  function  of  s  as  s 
ranges  from  -1  to  0.   Furthermore,  if  there  are  no  p.(j)  *  G^-l)  a?  0. 
This  follows  from  the  convexity  property  of  the  logarithm, 

p-li) 

(L  Pj  log  2.  ^  log  £  p.  st  for  £  P£*  l)  ;  we  have  ^{-1}  «S  log  Z  P  Q  ™- 

log  E  P.-  pAj)  ■  log  1.   Hence,  in  this  case,  for  each  E  from  C  to  the  mean 
mutual  information  related  to  the  probability  assignment  P.  there  is  a 
unique  s. 

If  there  are  some  (i,  j)  pairs  with  p.(j)  =  0,  it  is  possible  to  have 
ji'(s)  approach  RQ  >  0  as  s  approaches  -1.   This  happens,  for  example, 
in  the  channel  Fig.  3, 


FIG.  3 
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for  which  RQ  ~  log-^j-s  log  1.2  >  0.  In  such  a  case,  the  bound  as  written 

T 

above  applies  only  between  RQ  and  fi!(0)c  the  average  mutual  information 
with  the  given  probability  assignment.   We  may,  however,  extend  the 
bound  to  lower  rates  by  an  argument  similar  to  that  in  (1).   For  rates  R 
satisfying  0  «  R  «  R    choose  for  the  a  in  the  lemma  a  value  less  than 

°  pAi) 

nImin*  where  lmin  is  ths  smanes*       -g-Hamong  (i,j)  pairs  for  which 
PjU)  >  °.  fcnat       Kot  including  the  w-co»  cases).  We  then  have  that 
Prjft^a]  -  0,  since  all  cases  with  non-sero  probability  in  this  probability 
measure  give  I  values  at  least  nX   .  .  Also,  Prjl  *al  «  |     £  PQ.\n 

where  SF  is  the  set  of  (i,  j)  pairs  for  which  0^.(3)  $  0  and  hence  the 
mutual  information  is  finite.   In  fact,  /     £'  P.Q, \n  is  the  probs 

that  all  corresponding  letter  pairs  of  u  and  v  have  finite  mutual 
information.  If  any  pair  fails,  the  u  could  not  have  been  the  true 
cause  of  v^  since  that  letter  would  have  involved  a  transition  of  scero 
probability.  It  follows  that 


P  ^WEp 


Vs 


n/k+log  E  P,qA 
V         SF  V 


Thus,  in  this  region,  the  coefficient  of  n  in  the  exponent  is  a  linear 
function  of  the  rate  R  of  unit  slope  and  intercept    log  £  P.Q..  It  is 


14 


readily  seen  that  this  straight  line  is  tangent  to  the  curve  of  the  previous 
bound  at  the  value  R  *  RQ.  However  the  coefficient  in  F  has  improved 
from  2  to  1. 

Of  course,  in  the  ensemble  of  codes  there  must  exist  particular 
codes  satisfying  these  same  inequalities  for  error  probability,  since 
there  is  always  one  member  of  an  ensemble  at  least  as  good  as  the 
average.   Furthermore,  if  one  were  to  choose  samples  from  the  ensemble 
of  codes  with  their  corresponding  probabilities,  then  with  probability  at 
l3ast  j  -_L  a  sample  will  have  an  error  probability  less  than  or  equal  to 

k 

k  ¥   for  any  k  >  0.   For  example,  with  probability  at  least  .9,  the  sample 
would  have  an  error  probability  less  than  or  equal  to  10  PQ.   This  is 
because  Pe  is  non-negative  for  each  code  and  if  the  probability  of 
exceeding  k  F  were  more  than  ~  the  average  would  exceed  TQ,  a 


contradiction. 


Thus  a  code  could  be  generated  by  a  Monte  Carlo  process  or  by  use 
of  a  book  of  random  numbers  with  high  probability  of  not  exceeding  the 
error  probability  bounds  excessively. 
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Uniformly  good  codes 

The  bounds  above  refer,  of  course,,  to  average  error  probability 
over  the  different  messages  when  all  messages  are  used  with  equal 
probability.  It  is  also  of  interest  to  consider  uniformly  good  cedes 
for  which  each  message  has  a  low  error  probability.   From  a  code  of 
the  first  type  it  is  possible  to  construct  a  uniformly  good  code  with 
slightly  lower  rate  and  poorer  error  probability.  (  ^  In  fact,  if  we  have 
a  code  with  Mj  messages  and  error  probability  less  than  or  equal  to 
Pel  (the  messages  used  with  equal  probability),  then  at  least  half  of  the 
messages  must  have  individual  error  probabilities  less  than  or  equal 
to  2?el«   This  is  the  same  combinatorial  principle  as  used  above.  Thus 
we  can  find  a  code  with  ^  (or  Hp"  ^  M  is  odd)  messages  and  a 

uniform  error  probability  bound  of  2Pgl.   This  corresponds  to  a  rate 

1  i 
of  essentially  R  -  ~  log  2  and  a  reliability  of  E  -  ~  log  2,  where  R 

and  E  are  those  for  the  given  code.  In  other  words,  the  same  R  and 

E  curves  apply  if  displaced  in  both  coordinates  by  ~  log  2,  a  quantity 

which  rapidly  approaches  zero  as  the  code  length  n  increases. 

Such  uniformly  good  codes  have  the  desirable  feature  of  preserving 

the  same  bound  on  error  probability  even  if  the  prior  probabilities  of 

different  messages  are  changed.   Indeed,  they  may  be  used  if  such 

message  probabilities  were  entirely  unknown  or  felt  to  be  meaningless 

or  non-existent  in  a  particular  situation. 
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Best  bounds  under  variation  of  the 

The  above  bounds  were  deduced  on  the  basis  of  an  arbitrary 
assignment  P.  of  input  letter  probabilities.   To  obtain  the  strongest 
results  from  these  bounds  one  may,  for  any  particular  ft,  vary  the 
P4  and  attempt  to  find  the  set  which  gives  the  minimum,  bound  on  error 
probability.  Another  way  of  looking  at  this  is  that  the  E(E)  bounding 
curves  are  found  for  all  possible  assignments  and  the  envelope  of 
these  is  used.  It  may  be  readily  shown  that  if  the  channel  has  the 
"uniform  input0  property,  then  the  best  assignment  is  for  ail  input 
letters  to  have  equal  assigned  probability.  A  channel  has  the  uniform 
input  property  if  the  output  letters  can  be  partitioned  into  a  number  of 
subsets  Sj,  S2,  . . such  that  each  output  letter  in  any  subset  5^  has 
the  same  set  of  transition  probabilities  coming  into  it  and  each  input 
letter  has  the  same  set  of  transition  probabilities  going  into  A 
simple  example  is  the  erasure  channel  if  both  letters  have  the  same 
probability  of  being  erased. 
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Behavior  near  channel  capacity 

■ 

The  first-order  behavior  of  E,  the  coefficient  of  n  in  the  error 
probability  exponent,  for  rates  near  channel  capacity  may  be  found  by  a 
power  series  expansion  of  E(s)  and  R(s)  about  the  point  s  -  0.  Thus 

Z 

E(s)  *  E(0)  +  sE«(0)  -f-^-Ew(O)  +  . . . 

2 

*  0  +.  b($vl»(s)\  +  ~-  /|is(s)+sp«(sft  *  . . . 


R(s)  *  R(0)  f  sR«(0)  *  . . . 
«  C  *  s>iB(0} 

Eliminating  s  between  these  two  relations  we  obtain 

(R-C)2  «  s  V(0))2 
«=  2  E^CO) 

Thus,  near  channel  capacity,  the  ER  curve  is  approximately  parabolic 
with  second  derivative  at  C  equal  to  ^»|qJ  •  K  is  readily  shown  that 
jtK(0)  is  the  variance  of  mutual  information,  and  this  approximation  is 
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related  to  a  central  limit  theorem  normal  approximation  to  the  distribu- 
tion of  mutual  information  near  its  mean.   The  approximate  bound  here 
near  channel  capacity  is  the  same  as  that  in  (1),  the  two  curves 
"osculating™  at  channel  capacity  and  diverging  appreciably  only  at 
lower  rates. 
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(1)  C.  E.  Shannon,  °Certcun  KesulJs  in  Coding  Theory  for  Noisy  Channel 
Information  and  Control,  Vol.  I,  Mo.  1 
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1.  1  Introduction 

Even  "near  Perfect"  elements  may  not  be  adequate  for  extremely 
complicated  machines,  or  for  machines  whose  failure  might  result  in  a 
catastrophe.    Consider  a  complex  machine  made  up  of  10    components  each 
of  which  with  a  probability  of  failure  of  10-6  in  any  minute.       This  machine 
would  be  expected  to  fail  once  each  minute,  even  though  each  particular  com- 
ponent was  expected  to  fail  only  once  in  ten  years. 

In  case  men's  lives  depend  upon  the  successful  operation  of  a 
machine,  it  is  difficult  to  decide  on  a  satisfactorily  low  probability  of  failure, 
and  in  particular,  it  may  not  be  adequate  to  have  men's  fates  depend  upon  the 
successful  operation  of  single  components  as  good  as  they  may  be. 

The  following  methods  may  be  used  to  increase  reliability: 


1)  Complete  Redesign 

For  example,    a  digital  computer  may  be  used  to 
replace  an  analog  computer  in  order  to  gain  accuracy. 

2)  Improve  Components 

For  example,  most  relays  now  have  double  con- 
tacts and  are  several  orders  of  magnitude  more  re- 
liable than  single  contact  relays. 

3)  Use  Error  Detecting  Systems 

For  example,  numbers  may  be  represented  in  a 
computer  or  data  transmission  system  in  the  2  of  5 


code,  in  which  numbers  are  represented  by  all 
arrangements  of  two  ones  and  three  zeros  in 
five  bit  positions  on  a  paper  tape  or  other  medi- 
um.   A  component  failure  would  probably  result 
in  a  character  which  had  too  many  or  too  few  ones 
and  circuits  which  check  the  validity  of  the  char- 
acters would  detect  most  errors.    Another  example 
is  the  biquinary  code,  used  in  the  arithmetic  unit  of 
the  Bell  Laboratories  Relay  Computer.    In  this  code, 
numbers  are  represented  by  seven  bits  according  to 
the  following  scheme: 


In  the  Bell  Laboratories  Relay  Computer,  error 
detection  was  used  to  enable  error  correction  by  having 
the  machine  check  the  validity  of  the  coded  numbers 
after  each  operation  and  repeat  any  operation  which  re- 
sulted in  erroneous  results. 

4.     Error  Correction 

For  example,  though  the  individual  neurons  in  the 
brain  fail,  the  brain  usually  continues  to  operate  with- 
out a  serious  failure  for  many  years. 


The  fourth  method  of  improving  reliability  is  the  subject  of  this  part 
of  the  course. 


As  an  indication  of  the  type  of  analysis  that  will  be  made,  consider 


an  unreliable  machine  which  has  an  input  and  an  output  which  may  be  any  one 
of  many  symbols  (for  example  a  digital  computer  wht^re  output  is  a  ten  digit 
number): 


0 
1 

2 
3 
4 


01  10000 

01  01000 

01  00100 

01  00010 

01  00001 


5 
6 
7 

8 
9 


10  10000 

10  01000 

10  00100 

10  00010 

10  00001 


i  NPOT 


qui-  °  I T 


Fig.   1.  1 
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Also  consider  a  perfect  majority  device  (i.e.  the  majority  device  itself 
never  makes  errors), 


— ■> — 

M 

1  

— * — 

— ? — 

— * — 

Fig.  1.2 


which  has  three  inputs.    There  is  no  output  unless  two  of  the  inputs  agree, 
in  which  case  the  common  symbol  is  the  output.    Now  consider  three  copies 
of  the  original  machine  with  their  inputs  taken  from  the  same  source  and 
their  outputs  connected  to  the  majority  device: 


INPUT, 


 ^ —  OUTPUT" 


If  p=l-q  is  the  probability  that  the  output  of  one  of  the  devices  is 
correct,  if  the  probabilities  are  independent,  and  if  the  probability  that  two 
of  the  three  erroneously  agree  is  negligible,  the  probability  that  the  device 
shown  in  figure  1.  3  will  give  the  correct  output  is 


P  =p3  +3pCq=  (JL-q)3  +  3(l-q)'q 
=  l-3q^  +  2q3,  and 


.2„_ 


(1.  1) 


Q  =  1-P  =  3q2  -2q3 


If  q  is  small,  Q  may  be  much  smaller,  while  if  q  is  large,  Q  may 
be  considerably  larger.    For  example,  if  q=10"^,  Q=3xl0~l2,  while  if 
q=0.  7,  Q=0.764.    If  q  =  l- 10    then  Q  =  1 -3x10"  12 


Frequently  a  complex  device  is  made  up  of  many  devices  connect- 
ed in  cascade: 


— >— 

Fig.  1.4 


-4- 


Instead  of  the  system  considered  in  Figure  1.3,  consider  the  following 
system: 


Fig.  1.5 

This  system  is  at  least  as  good  as  that  shown  in  Fig.  1.3, 
because  that  system  requires  that  all  the  blocks  in  two  rows  function 
correctly.    This  system  will  give  the  correct  result  in  that  case  and 
in  many  others. 

If  the  error  probability  for  each  of  the  four  parts  of  the 
machine  shown  in  figure  1.4  is  10"   ,  the  error  probability  for  the 
four  in  cascade  is  q=l(l-10~3)4=4xl0~3.    The  majority  organ  would 
correct  this  to  Q=3q2-2q  =48xl0"6.    Using  the  scheme  of  figure  1.  5 
for  each  stage  q  =  10"3,  and  hence  Q=3xl0"°.    Four  stages  cascaded 
will  give  an  overall  probability  of  error  l-(103xl0"  )4  =  12xl0"^,  which 
is  one  fourth  that  obtained  by  the  other  system.    If  the  probability  for 
each  part  of  the  machine  is  taken  as  0.  2  instead  of  10 ~3,  the  resulting 
error  probabilities  for  the  systems  shown  in  figures  1.  3  and  1.  5  are 
.  51  and.  38  respectively. 

The  poor  features  of  this  system  compared  to  1.  3  are,   1)  the 
cost  of  the  majority  devices,  and  2)  that  in  practice  majority  devices 
are  not  perfect,  and  they  introduce  errors  also.    If  the  machine  is  broken 
down  into  small  enough  parts  the  majority  devices  may  introduce  more 
errors  than  they  correct. 

1.  2     VonNeuman's  Probabilistic  Logics 

The  basic  scheme  used  by  VonNeuman  is  to  design  the  desired 
automaton  from  some  sort  of  idealized  components  and  then  to  describe  a 
way  of  transforming  this  into  a  reliable  automaton  built  from  unreliable 
components. 
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The  ideal  components  have  a  number  of  inputs  and  one  output, 
as  in  the  following  diagram: 


V 


Fig.  1.6 

The  output  variable  y  and  all  the  input  variables  take  on  only  the  values 
0  and  1.    The  output  is  a  function  of  the  input  variables, 

y=f(Xl,  X2,  X2,  X3,  X4) 

but  it  is  delayed  by  a  time  §  .       In  the  following  analysis  all  elements 
will  be  assumed  to  have  the  same  delay,  and  this  will  be  used  as  the 
unit  of  time. 

In  designing  an  automaton  from  these  elements  it  is  assumed 
that  any  output  can  be  branched  and  connected  to  any  number  of  inputs, 
but  that  two  outputs  are  never  connected  together. 

These  ideal  elements  might  be  thought  of  as  idealized  neurons 
or  computer  logical  circuits,  but  we  will  consider  them  simply  as  mathe- 
matical  model  without  any  particular  interpretation. 

The  following  special  type  of  ideal  element  is  particularly  use- 
ful: 

exitatory  inputs 


inhibitory  inputs 

Fig.  1.7 

The  device  may  have  any  number  of  exitatory  inputs  and  any  number  of  in- 
hibitatory  inputs.    The  device  has  a  one  as  output  only  when 


Ne-Ni  >  h, 
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where  Ne  is  the  number  of  exitatory  inputs  receiving  l's  and       is  the 
number  of  inhibitory  inputs  receiving  l's.    A  bus  for  supplying  constant 
l's  and  one  for  constant  zeros  will  be  assumed. 

Devices  like  these  can  be  analyzed  by  the  propositional  calculus^, 
defined  as  combinations  of  "and",  "or",  and  "not"  operations  on  variables 
and  polynomials  made  from  these  variables  by  these  three  operations.  These 
operations  are  defined  as  follows: 


Name 
"and" 


Symbol 


Xl  •  X2 


'or' 


Xl  +  X2 


TRUTH  TABLE 


X. 


0  1 


0  0 

1           j  0  1 

X-. 

0  1 


0  1 


1  1 


'not" 


xl 

1 

0 

xl 

0 

1 

One  noteworthy  theorem  from  the  propositional  calculus  states  that 
any  polynomial  can  be  written  uniquely  in  the  following  canonical  form: 

1 

p  (X1,  X2,  Xn>  =  XZ7  ai...i     Xl        X2   Xn 

V  V  l3v.'''"°      1  n 


1^     x2  1 
c  n 


(1.  3) 

where  X°  =  X  and  X1  =  X  .    The  coeficients  ai   ^  .  .  .  .  i    are  essentially  the 

12  n 

truth  table  for  p,  and  the  proof  consists  essentially  of  noting  that  if  the  truth 
table'of  a  polynomial  p  is  used  as  coeficients,  the  canonical  form  will  agree 
with  p. 


1.      Couturat,  The  Algebra  of  Logic,  Paris,  1905 

Birkoff  &  Maclane,  A  Survey  of  Modern  Algebra  ,  MacMillan,  1953. 
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Ideal  elements  which  do  the  "and",  "or,  and  "not"  operations 
can  be  formed  as  follows: 


5J> 


'and' 


'or' 


"not" 


Fig.  1.8 


Of  these  a  device  with  arbitrary  function  as  output  can  be  built.    This  can 
be  proved  by  an  induction  on  the  number  of  "and",  "or",  and  "not"  opera- 
tions in  the  expression.    For  n=l,  the  function  can  be  formed  with  one  of 
the  basic  elements  shown  in  figure  1.  8.    Assuming  that  the  statement  is 
true  for  all  functions  containing  no  more  than  n  operations,  the  device  for 
a  function  with  n+ 1  operations  can  be  constructed  as  follows:    consider  the 
(n  +  l)st  operation.    Its  operand(s)  certainly  contain  no  more  than  n  opera- 
tions, and  therefore,  devices  can  be  constructed  which  correspond  to  them. 
The  outputs  from  these  devices  can  be  combined  using  the  basic  element 
corresponding  to  the  (n+  l)st  operation  to  give  the  required  device. 

Any  function  can  be  generated  in  this  manner,  but  there  will  be 
a  delay.    For  any  given  function  there  is  a  minimum  delay.    The  same  func> 
tion  can  be  obtained  with  arbitrary  delay  greater  than  the  same  function  by 
using  any  number  of   "and"  circuits  as  unit  delay  elements. 


Fig.   1.9    -  Delay  Element 

In  order  to  simplify  the  mathematical  analysis,  we  wish  to  re- 
duce the  number  of  types  of  elements  to  a  minimum.    By  using  DeMorgan's 
Theorem: 

(x+y)'  =x'y\  or  x+y  =  (x'y')1 
the  "or"  can  be  obtained  from  "and"  and  "not"  elements.  Thus 

)0. 

is  "or"  with  3  units  delay 


is  "and"  with  3  units  delay 


Fig.   1. 10 


is  "not"  with  3  units  delay 
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and  these  could  be  used  as  basic  elements.    Similarly,  "and"  can  be 
obtained  from  "or"  and  "not". 

The  "not"  operation  cannot  be  obtained  from  "or"  and  "and"  , 
however.    The  "and"  and  "or"  operations  are  monotone,  i.e.  increas- 
ing one  of  the  arguments  from  0  to  1  never  causes  the  result  to  decrease 
from  1  to  0.    Any  combination  of  monotone  operations  will  result  in  a 
monotone  function.    Hence  "not",  which  is  not  monotone,  cannot  be  obtained 
from  any  combination  of  "and"  and  "or"  operations. 

There  is  another  way  of  organizing  an  automaton  which  does 
make  "and"  and  "or"  sufficient  for  obtaining  all  polynomials.  It  is  the 
"double  line  trick"  in  which  each  variable  is  represented  by  two  lines. 
A  one  is  represented  by  a  1  on  the  first  line  of  the  pair  and  an  0  on  the 
second,  while  a  zero  is  represented  by  the  opposite.  With  this  conven- 
tion, the  "and",  "or",  and  "not"  can  be  obtained  from  "and"  and  "or" 
elements  as  follows: 

 \  y  

"not"  J><r 


■and' 


The  "or"  can  bs  obtained  by  using  DeMorgan's  Theorem,  ie.  ,  by  twisting 
both  input  lines  and  the  output  line.    (I  turns  out  that  this  is  equivalent  to 
interchanging  the  basic  "and"  and  "or"  elements  in  the  diagram  of  the  "and' 
circuit.  )    Note  that  the  "not"  circuit  needs  a  delay  of  one  unit  to  make  it 
have  the  same  delay  as  the  "and"  and  "or". 

It  was  discovered  by  Scheffer  that  there  are  single  functions 
from  which  all  these,  "and",  "or",  and  "not"  can  be  obtained.    One  is  the 
Scheffer  stroke  function; 


(x|y)   =  (xy)'  =  x'  +  y1 


1 

0 


1 

0 


(1-4) 


In  terms  of  it, 

x1  =  (x|x) 
x-  y  =  (x  y) 


(xy) 


x+y  =  (x  x)     (y  y) 


(1 


The  stroke  function  can  be  represented  by  an  element  of  the 
following  type: 


Fig.   1.  12 


and  "and1 


"or"  and  "not"  circuits  can  be  built  from  Scheffer  stroke 


elements  according  tothe  above  formulas.    Note,  however,  that  the 
"not"  circuit  requires  one  stroke  function  and  hence  only  one  unit  de - 
lay,,  while  the  "and"  and  "or"  circuits  require  2  stroke  functions  cas  - 
caded,  and  hence  two  units  of  delay.    This  time  the  delay  cannot  be 
equalized.    The  stroke  function  is  anti -monotone.    In  a  device  made 
from  stroke  functions;  any  points  removed  from  the  input  by  one  stroke 
element  will  be  anti -monotone.    Any  points  two  levels  deep  are  mono- 
tone, etc.    Thus  the  "not"  which  is  anti-monotone  cannot  be  obtained 
at  the  same  level,  and  hence  time  delay,  as  the  "and"  and  "or"  which 
are  monotone. 

Since  "and"  and  "or"  can  be  obtained  at  the  same  level,  we 
can  use  the  double -line  trick  to  obtain  "and",  "or",  and  "not"  in 
of  stroke  elements. 


Another  element  of  interest  is  the  "majority  organ": 


Fig.   1. 13 

It  is  monotone,  but  the  "and"  and  "or"  can  be  obtained  from  it  as  fol- 
lows: 


x+r 


Fig.   1.  14 


and  hence  the  majority  organ  is  universal  when  used  with  the  double 
lina  trick. 
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From  the  elements  we  have  discussed  we  can  build  black  boxes 
of  the  following  kind,  with  any  given  set  of  propositional  functions 


fi  (x 


r 


.  xn)  i  =  l, 2, . . 


m, 


,x- 


2'  *  * 
x2,  • 

(x,  ,x- 


f2  (xr 


xn> 
.xn) 

•xn> 


fm  ^xl'  x2'  •  •  '  -  xn) 


1.  15 


and  we  can  line  up  the  outputs  by  using  delay  elements.  The  notation 
can  be  shortened  by  writing  Xfor  the  vector  (xj.X2.x3,  .  .  .  .  xn)  and  F 
(X)  for  the  vector  function  (fpf2  fn). 


A  more  general  type  of  machine  has  outputs  which  depend 
not  only  on  the  input  but  also  in  some  way  upon  the  previous  history  of 
the  device.    One  very  general  model  of  such  a  device  is  the  "finite  state 
transducer"  which  is  a  satisfactory  representation  of  a  digital  computer, 
for  example.    At  each  time  interval  i  it  is  given  an  input  (vector)  X  ,  it 
has  a  state  (vector)  S1  which  can  assume  a  finite  number  of  possible 
values,  and  produces  an  output  (vector)  Y  : 


m 

1 — 

Fig.   1.  16 

They  are  related  by  the  following  equations: 

si+1=f<s\  x1) 

Y*     =  g(s\  X1) 

The  relationship  between  finite  state  transducers  and 
devices  made  of  the  basic  elements  is  made  clear  by  the  following 
two  theorems: 


(1. 


THEOREM  Any  device  made  by  combining  a  finite  number  of  basic 
elements  is  a  finite  state  transducer. 


If  the  output  of  the  j     element  at  time  i  is  denoted  by  sj,  then 
certainly  sj  is  a  function  of  sj".1     and  the  input,  and  can  be  interpreted  as 
the  components  state  .vector   S1.    Then  the  output  (vector)  Y   is  certainly 
a  function  of  S1  and  X1,  since  the  outputs  must  come  either  from  an  element 
or  directly  from  the  input. 

THEOREM      Given  the  equations  for  a  finite  state  transducer,  such  a  trans- 
ducer can  be  built  of  basic  "and",  "or"  and  "not"  elements,  (or  any  other  set 
of  universal  components)  except  that  the  interval  of  time  between  inputs  and 
between  outputs  will  be  some  multiple  of  the  unit  delay,  and  the  outputs  Y 
may  be  delayed  by  some  multiple  of  the  unit  delay. 

Suppose  there  are  k  states.    They  can  be  represented  by  the 
binary  numbers  from  zero' to  k-1,  and  the  binary  digits  of  these  numbers  can 
be  used  as  the  components  of  the  state  vector  S  .    Then  the  original  given 
equations  for  the  transducer  become  equations  in  binary  variables: 

Si+1=f(Si,  X1) 

Y*     =qt»\  X1) 


Black  boxes  of  the  type  shown  in  Fig.   1.  15  can  be  made  corresponding  to 
each  of  these  equations,  but  they  will  have  delays.    Suppose  each  output 
from  the  first  is  delayed  dj  units,  and  each  output  from  the  second  d2  units. 
Then  the  finite  state  transducer  is  obtained  by  connecting  them  as  follows: 


Fig.   1.  17 


If  an  input  is  entered  at  times  0,dj,  2dj,  3(1^,  etc.  ,  the  inputs  will 
synchronize  with  the  state  variables  coming  out  of  the  f  box  to  satisfy 
the  first  equation,  and  the  second  box  will  produce  the  outputs  according 
to  the  second  equation,  at  times  d^.d^  +  d^,  etc. 
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(Actually,  the  machine  could  simultaneously  process  dj  input 
sequences  starting  at  times  0,  1,  2,  . . .  d, -1  respectively  and  produce  the 
dj  output  sequences  similarly  meshed). 


Problems: 

1.  a.     Prove  that  it  is  possible  to  build  a  device  with  two  inputs  and 

one  output  which  produces  the  sum  of  the  input  binary  numbers 
for  numbers  of  arbitrary  length. 

b.     Prove  that  this  is  not  possible  for  multiplication.    (It  is  also 
not  possible  for  square  root) 

2.  Design  a  device  from  an  infinite  number  of  "and",  ''or",  and 
"not"  elements  which  is  equivalent  to  a  universal  Turing 
machine. 


The  following  are  examples  of  types  of  machines  which  are 
not  finite  state  transducers: 

1)  A  device  which  has  an  infinite  number  of  states,  for 
example,  a  Turing  machine  with  its  infinite  tape. 

2)  A  device  in  which  continuous  variables  occur,  and  hence 
there  are  again  an  infinite  number  of  states.    An  analog  computer  is  an 
example.    In  cases  of  this  type  the  variables  usually  have  strong  conti- 
nuity conditions  and  can  be  approximated  to  any  desired  degree  by  quantized 
variables.    Hence  the  machine  can  often  be  approximated  by  a  finite  state 
transducer.    (For  example,  a  digital  differential  analyzer  approximates 

an  analog  differential  analyzer.) 

3)  A  device  which  contains  a  random  element.    For  example, 
a  computer  which  makes  unpredictable  errors.    A  device  with  a  random 
element  might  be  very  useful,  for  example  m  playing  games  of  strategy 
in  which  a  mixed  strategy  is  called  for. 

Now  we  shall  consider  automata  constructed  of  basic  elements 
which  sometimes  fail,  with  the  failures  occurring  according  to  some  prob- 
ability measure.    We  could  assume  a  completely  general  probability  meas- 
ure on  the  space  of  all  parts  of  the  machine,  i.  e.  ,  we  could  include  all  sorts 
of  correlation.    Some  correlation  certainly  occurs  in  real  machines.  Vacuum 
tube  failures,  for  example,  are  frequently  the  result  of  the  application  of  im- 
proper voltages.    Since  the  voltages  are  usually  applied  to  many  tubes,  a  num- 
ber of  failures  may  result  from  one  occurrance  of  improper  voltage,  and  hence 
correlation  appears  among  the  failures.    Likewise,  in  a  relay  machine  most 
failures  result  from  dust.    The  fact  that  one  relay  fails  is  an  indication  that 
dust  is  present  and  other  relays  are  likely  to  fail  also. 
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To  assume  a  completely  general  probability  measure  would 
make  the  problem  so  difficult  mathematically  that  we  could  hardly  ex- 
pect to  accomplish  anything. 

We  shall  assume  that  the  errors  which  occur  in  the  different 
basic  elements  are  independent.    We  shall  use  majority  organs  as  basic 
elements  and  assume  that  the  probability  of  erroneous  output  is  re- 
gardless of  the  number  of  l's  at  the  input.    (A  physical  realization  of  the 
majority  organ  might  not  have  this  property.    It  might  be  more  reliable 
when  the  inputs  are  zeros  than  when  they  are  ones,  or  it  might  be  more 
reliable  when  all  inputs  are  alike  than  with  two  ones  and  a  zero  or  vice 
versa).       (A  possible  generalization  of  this  would  be  to  assume  that  the 
probability  of  failure  of  any  element  is  less  than    j£  regardless  of  the 
number  of  l's  at  the  input  and  regardless  of  the  state  of  other  parts  of 
the  machine). 

If  the  output  from  the  machine  appears  on  one  line,  the  prob- 
ability of  error  of  the  output  is  at  least  6     (except  in  the  trivial  cases  in 
which  it  comes  directly  from  the  input  or  from  a  zero  or  one  bus)  simply 
because  the  output  must  come  from  a  majority  organ  which  has  probabil- 
ity of  error  of    CT  - 

If  xjj,     n^,     n^.,  are  upper  bounds  on  the  error  probabilities 
of  the  three  inputs  to  a  majority  organ,  then  the  probability  of  error  for 
the  output  of  the  majority  organ  satisfies  the  inequality 

n*^  ^i"*--*]* (1.7a) 

This  gives  an  absolute  upper  bound  on  the  error  probability,  regardless 
of  any  correlations  which  may  exist.    It  does  not  offer  any  hope  of  im- 
provement since  it  never  promises  any  decrease  in  error  probability  at 
the  output  of  the  majority  organ. 

If  we  assume,   1)  that  these  probabilities  are  independent,  and, 
2)  that  the  three  inputs  agree  if  they  are  correct,  a  stronger  result  can 
be  obtained.    The  probability  that  at  least  two  of  the  inputs  are  incorrect 
is  then 

e;I)ll2  (1^3)+I)l  ^3  ^-^  +  T\Z  ^3(1-^1)  +  I}l  ^2  ^3 

=  ^l  Vll  x>3+  ^2  ^3  '  2rU  12  ^3-  (1-  7b> 

The  probability  of  an  error  in  the  output  is  the  probability  that  either 
1)  at  least  two  inputs  are  incorrect,  and  the  majority  organ  works  prop- 
erly, or  2)  at  least  two  inputs  are  correct  and  the  majority  organ  makes 
an  error,  (but  not  both,  )  hence, 
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n*  =  9(1-   £  )  +  €(1-0). 

If  ^1=^2  =^3  =^  ' 

2  3 
9  =  3n  -2^  ,  and 

n*  =  e  +  (1-2.6  )(3n2-2n3) 

Now  consider  a  machine  which  makes  errors: 


(1.8) 


(1.9) 


— 

— 

— * — 

 ^ — 

Make  three  identical  copies  and  connect  the  outputs  to  a  majority  organ. 


Fig.   1.  18 

Errors  in  the  outputs  can  now  be  considered  independent  because  they 
occur  in  different  machines.    Also,  the  outputs  will  agree  if  they  are 
correct.    Hence  the  above  formula  applies.    But  if  this  is  to  work,  the 
output  error  probability  n*  must  be  less  than  the  input  error  probability 
n. 

From  equation  (1.  9)  it  can  be  shown  that  n*  considered  as  a 
function  of  n  passes  through  the  point  (1/2,  1/2).    It  has  zero  slope  when 
n  =  0  or  1.    The  graph  looks  something  like  this: 


-15- 


The  curve  is  tangent  to  the  diagonal  at  the  center  for    £-  =  1/6.  In 
order  for  rf*  to  be  less  than  rj.  the  curve  must  lie  below  the  diagonal 
for      <  1/2,  and  hence  C    must  be  less  than  1/6.    The  curve  for 

£  =1/12   crosses  the  diagonal  at  n=l/2  and  also  at  i^=0.  15  and  0.  85 
approximately.    For  n  K  0.  15,  n*  ^  n  and  the  error  probability  is 
increased.    For  0.  15  <    n_   <   1/2,  on  the  other  hand,  n*  <  n  and  the 
error  probability  is  decreased.    In  either  case  iterating  the  procedure 
makes  the  error  probability  approach  0.  15.    Thus  this  crossing  acts 
as  a  kind  of  stable  point. 

Now  let  us  consider  in  more  detail  the  design  of  a  machine. 
Consider  a  machine  (designed  on  the  assumption  of  error  free  com- 
ponents) which  has  no  feedback  in  it.    It  would  be  of  the  type  shown  in 
Figure  1.  15. 

Theorem  Given  an  error -free  designQ  with  no  feedback  we  can  con- 
struct an  equivalent  machine  Q*  (with  added  delay).  Each  element  of 
Q*  has  error  probability  £  ,  but  the  whole  machine  has  error  prob- 
ability less  than  n^  £  ).  n,(  £  )  is  independent  of  machine  complexity 
and  approaches  zero  as   £    approaches  zero. 

The  proof  is  by  an  induction  on  the  depth  n  of  the  machine. 
The  theorem  is  obviously  true  for  n=0,  since  that  would  mean  all  out- 
puts come  directly  from  inputs  or  zero  or  one  buses.  Assume  that  it 
is  true  for  n=k,  and  consider  a  machine  of  depth  k+  1.  Since  there  is 
no  feedback,  all  the  outputs  from  the  majority  organs  at  the  greatest 
depth  must  be  connected  only  to  outputs  of  the  machine.  If  these  ele- 
ments are  removed,  the  rest  of  the  machine  will  have  depth  k: 


equivalent 
to 


Fig.  1.20 

Now  by  the  induction  hypothesis  we  can  build  this  machine 
of  depth  k  with  error  probability  less  than  iy(  £  ).    Build  three  copies 
of  it.    Then  connect  each  set  of  these  corresponding  outputs  to  a  ma- 
jority organ.    Finally  connect  these  outputs  to  the  k+  1  layer  of  majority 
organs: 
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Fig.  1.21 

The  error  probability  at  the  outputs  of  the  devices  of  depth  k 
is  less  than  n(       )  by  the  induction  hypothesis.    Also,  the  three  boxes 
are  independent,  and  the  outputs  on  corresponding  lines,  which  go  to  a 
correcting  majority  organ,  will  agree  if  they  are  correct.  Therefore, 
a  bound  on  probability  of  error  at  the  output  of  the  correcting  organs  is 
given  by  equation  (1.  9): 

n*  =    e  +   d-2  e  )(3ij2-2h3) 

The  probabilities  at  the  inputs  to  the  computing  organs  are  less  than  n*  , 
but  they  are  not  necessarily  independent,  nor  need  they  agree  if  correct. 
Therefore,  we  use  equation    (  1.  7a): 

P€     1=     3n*+    e  (1.1 

Combining  these  equations,  we  find 


Pe     £      4^      +3(1-2  £  )(3n2-2n3) 


(1.1 
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for  the  probability  of  error  at  the  output  of  the  device  as  shown  in 
figure  1.  21.  In  order  to  complete  the  proof  we  have  to  assure  that 
P    ^  if,  and  it  is  this  requirement  that  defines  the  function  n,  (  £-  ) 

^  The  curve  of  equation  (1.  11)  is  similar  to  the  curve  of 

equation  (1.9),  except  that 


o  >         y\  \ 

Fig.  1.22 

the  critical    £    at  which  the  curve  becomes  tangent  to  the  diagonal  is 
approximately  .  0073.    Clearly,  if        is  any  number  such  that 
nQ  ^    jS     <     1/2,  (where  nQ  is  the  point  where  the  curve  crosses  the 
diagonal,  then  whenever  n  <  p   ,    Pv   will  be  less  than  0  also.  There- 
fore the  function  i^(  £  )  can  be  any  function  which  satisfies  the  inequal- 
ity J\0   4    n(  £  )    £1/2  for  all    £  .    In  particular  n(  £  )  =  Y\Q  is  accept- 
able. 

Note  that  the  fact  that  there  is  no  feedback  plays  a  part  in 

this  proof. 

One  variation  on  this  system  would  be  to  iterate  this  trip- 
licating a  number  of  times  at  each  level  of  depth  of  the  device.  This 
will  permit  using  majority  organs  which  have  error  probabilities  greater 
than  .  0073;  it  is  possible  to  have  C    at  least  as  large  as  .  125,  and  prob- 
ably  very  near  1/6. 

Adding  one  level  of  depth  triplicates  all  previous  equipment 
and  adds  some,  so  that  the  redesigned  machine  contains  much  more 
than  3n  times  the  amount  of  equipment  involved  in  the  first  level  of  depth. 
Even  for  modest  values  of  n,  this  makes  a  fantastically  large  machine. 

Now  we  will  consider  another  system,  which  is  less  sensitive 
to  errors  on  individual  lines.    It  is  called  "multiplexing  of  lines".  With 
this  system,  one  line  in  the  original  device  is  represented  by  a  "bundle" 
of  many  lines,  most  of  which  will  carry  a  one  when  the  corresponding  line 
in  the  original  machine  carries  a  one,  and  most  would  carry  a  zero  when 
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the  corresponding  line    carries  a  zero.    The  threshold  level  will  be 
denoted  by  £  :  if  the  fraction  of  lines  excited  in  a  bundle  is  less  than 

^  ,  the  bundle  will  be  interpreted  as  carrying  a  zero.    If  the  fraction 
is  greater  than  1-  ^    ,  it.  will  be  interpreted  as  a  1.    If  it  is  between  $ 
and  1-  S>  the  result  will  be  considered  as  uncertain.    This  "fiduciary 
level"    &  ,  does  not  enter  into  the  machine,  but  only  into  the  analysis 
of  the  machine. 

A  majority  organ  for  bundles  can  be  made  as  follows: 


Fig.  1.23 

If  all  the  lines  in  each  of  two  bundles  are  excited,  then  except  for  ma- 
jority function  errors,  all  outputs  will  be  excited.    Similarly,  it  works 
for  all  zeros  on  two  bundles,  so  that  the  device  works  roughly  as  it 
should. 

Now  suppose  fractions  a,  b,  and  c  respectively  of  the  three 
inputs  are  in  error.    Neglect  errors  in  the  majority  organs.    Also  sup- 
pose that  the  first  two  bundles  carry  l's,  while  the  third  carries  a  zero. 
The  largest  number  of  errors  in  the  output  would  be  achieved  by  having 
all  the  zeros  in  the  first  bundle  matched  with  ones  of  the  second  and 
zeros  of  the  third.    Similarly  all  the  zeros  of  the  second  bundle  should 
be  matched  with  ones  of  the  first  and  zeros  of  the  third.    This  would 
make  a  fraction  a+b  of  the  outputs  wrong.    The  same  would  apply  if  the 
first  two  were  zeros  and  the  third  a  one,  by  the  duality  between  zero 
and  one  in  the  majority  organ. 

If  all  three  inputs  are  ones,  then  there  will  be  the  most  errors 
in  the  output  if  every  error  in  the  output  is  caused  by  two  erroneous  input 
lines.     The  number  of  errors  in  the  output  bundle  certainly  cannot  exceed 
half  the  total  number  of  erroneous  lines  in  all  three  bundles  at  the  input. 
Thus  the  fraction     d  of  errors  in  the  output  is 


d  <  1/2  (a+b+c). 


(1. 


-19- 


(This  can  almost  be  achieved  if  a,  b,  and  c  are  the  sides  of  a  triangle. 
Otherwise,   d  is  less  than  the  sum  of  the  smallest  two  of  a,b,  c.  ) 

If  a=b,  =c,  the  bound  on  errors  at  the  output  is  2a  for  the 
first  case  (not  all  three  inputs  the  same)  and  3/2a  for  the  second  (all 
three  inputs  the  same).    The  bound  we  have  on  error  probability  at  the 
output  of  the  organ  (2a  or  3/2a)  is  thus  greater  than  the  bound     a  on 
error  probability  at  the  input. 

The  error  probability  might  decrease  if  we  consider  an  av- 
erage situation  instead  of  the  worst  possible  situation.      Consider  the 
case  in  which  all  three  bundles  are  carrying  the  same  symbol  (0  or  1), 
and  take  the  average  over  all  permutations  of  all  the  erroneous  lines  in 
each  of  the  input  bundles.    Then  the  probability  of  at  least  two  erroneous 
inputs  to  any  given  majority  element  is 

d  =  ab(l-c)+ac(l-b)  +  bc(l-a)  +  abc 

=  ab  +  bc+ ca-2abc  (1.13) 

and  this  will  also  be  the  mean  fraction  of  lines  excited  in  the  output 
(Assuming    £   =  0).    In  any  particular  case  some  variation  from  this 
would  be  expected. 

If  a=b=c, 

d=3a2-2a3,  (1.  14) 

the  same  equation  which  occurred  before  (Equations  (1.  1)  and  (1.  9) 
with     £    =0),  but  for  a  different  reason. 

VonNeuman  proposed  the  following  as  a  system  for  restor- 
ing the  level  of  the  fraction  of  lines  excited  in  a  bundle  (to  mean  0  or  1). 
Each  line  in  a  bundle  is  to  be  split  three  ways,  to  get  3n  lines.  Then 
these  would  be  put  through  a  "random  permutation"  black  box.    The  out- 
puts would  be  connected  to  majority  organs: 


 j 

Fig.  1.24 
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This  black  box  might  be  wired  so  that  each  input  is  connected  to  one 
and  only  one  output  according  to  a  table  of  random  numbers.    The  idea 
is  to  achieve  the  effect  of  independence  of  the  inputs   to  any  one  major- 
ity organ  so  that  formula  (1.  13)  applies.    There  is  no  rigorous  proof 
that  this  can  be  done,  but  it  seems  very  plausable. 

The  same  analysis  can  be  done  with  Scheffer  stroke  organs. 
It  could  be  done  indirectly  by  noting  that  a   majority  organ  can  be  con- 
structed from  any  set  of  universal  organs,  and  hence  all  results  which 
hold  for  majority  organs  hold  also  for  any  other  set  of  universal  organs. 
The  er  ror  probability    ^     would  have  to  be  that  for  the  constructed  ma- 
jority organ,  of  course,  rather  than  that  for  the  basic  elements  them- 
selves.   The  analysis  for  the  stroke  organs  is  simple  and  interesting 
enough  to  do  in  detail. 


The  stroke  function  for  a  bundle  can  be  constructed  as  fol- 


lows: 


Fig.  1.25 


If  both  inputs  are  supposed  to  be  on,  the  result  should  be  zero  and  an 
error  will  occur  if  either  input  to  an  organ  is  off.    Therefore  the  num- 
ber of  errors  in  the  output  can  ba  as  great  as  the  sum  of  the  number  of 
errors  on  both  inputs,  but  it  cannot  exceed  this  number.    Therefore  the 
fraction  of  errors  c  is  bounded  by  a+b.    If  the  first  bundle  has  a  1  and 
the  second  a  zero,  the  answer  is  supposed  to  be  one  and  would  become 
zero  only  with  an  error  in  the  second  line,  so  the  fraction  of  errors  in 
the  output  cannot  exceed  b.    Similarly,  with  a  zero  on  the  first  bundle 
and  a  one  on  the  second,  the  fraction  of  errors  in  the  output  is  no  greater 
than  a.     When  both  inputs  are  zeros,   only  two  erronsous  inputs  would  re- 
sult in  an  erroneous  output,  m  which  case  c  is  nD  greater  than  the  smalle 
of  a  and  b. 
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If  the  fraction  of  inputs  excited  is  a   for  both  inputs  and  the 
average  over  all  permutations  is  considered,  then  an  output  will  be  0 
only  if  both  inputs  are  1  to  a  particular  stroke  organ,  and  this  would 
occur  on  the  average  for  a  fraction  a2  of  the  line.    Therefore  the  frac- 
tion of  lines  excited  at  the  output  would  be 

c  =  l-a2  (1.15) 


The  curve  looks  like  this: 


Fig.   1.26  C\ 


\ 


It  does  not  restore,  but  rather  reverses.    To  get  restoring, 
the  process  should  be  done  twice.    The  effect  of  the  iteration  can  be 
found  by  substituting  (1.  15)  in  itself  as  the  argument,  i.e. 

,  2,2      ,2  4 

a*=l-(l-a  )     =  2a  -a 


(1.  16) 


1.  27 


To  review  the  design  procedure,  we  start  with  a  single  line 
machine  designed  for  error  free  stroke  elements.    Each  line  is  replaced 
by  a  bundle.    Each  organ  is  replaced  by  a  bundle  organ,  followed  by  a 
pair  of  cascaded  Scheffer  stroke  restoring  organs  with  "random  permu- 
tation" black  boxes. 
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Until  now  we  have  not  considered  errors  in  the  basic  organs. 
Furthermore  we  have  not  considered  the  effect  of  dispersion  in  the  num- 
ber of  lines  excited  in  a  bundle.    We  have  shown  only  that  the  average 
number  of  lines  excited  in  a  bundle  can  be  kept  under  control.    The  prob- 
ability that  the  deviation  from  this  average  value  will  cause  failure  must 
be  considered.    The  number  of  lines  excited  has  a  distribution  similar  to 
a  binomial  distribution,  and  in  our  case  as  with  the  binomial  distribution, 
the  dispersion  can  be  made  very  small  by  making  the  number  of  lines  per 
bundle  very  large. 

The  rest  of  the  analysis  involves  a  considerable  amount  of 
algebraic  manipulation,  and  it  will  only  be  outlined  here. 

The  worst  case  will  obviously  occur  when  the  fraction  of  errors 
on  each  input  line  is  a  maximum,  and  therefore  the  calculation  is  made  for 
that  case.     The  probability  distribution  is  calculated  for  each  of  the  com- 
binations of  input  signals,  and  from  the  probability  distributions,  the  prob- 
ability that  the  fraction  of  errors  will  exceed  the  "fiduciary  level"  can 
be  found.    This  will  be  called  the  probability  of  error  for  this  part  of  the 
machine.    This  is  really  a  conservative  estimate  of  probability  of  error, 
since  the  machine  might,  and  probably  would,  function  perfectly  well  and 
have  the  fraction  of  errors  in  the  outputs  less  than  the  fiduciary  level  even 
though  the  fiduciary  level  might  be  exceeded  at  certain  points  within  the  ma- 
chine. 

Let  'X  and  jJ  be  the  fractions  of  inputs  carrying  l's  on  the  two 
inputs  of  a  stroke  function  for  bundles.  These  bundles  are  assumed  to  come 
from  different  randomizing  boxes  and  restoring  systems  and  hence  the  ar- 
rangement of  the  errors  in  the  two  bundles  can  be  considered  random  and  in- 
dependent. The  probability  distribution  for  the  number  of  individual  stroke 
elements  "excited"  by  two  I  s  can  be  calculated.  It  turns  out  to  be  approx- 
imately normal  for  large  bundles,  with 


mean  = 
variance 


(1.  17) 


where    IV  is  the  number  of  lines  in  a  bundle.    This  mean  is  consistent  with 
the  average  fraction  of  lines  excited  at  the  output  calculated  before,  as  given 
in  equation  (1.  15). 
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(1.  18) 


Assuming  that  the  individual  stroke  elements  have  probab- 
ility of  an  error,  the  probability  distribution  of  errors  in  the  out- 
put bundle  is  still  approximately  normal  for  large  bundles,  with 

mean  =     Q\  k)  —  Z  &  (  fyj  -  fsj 

variance  =  [(l^)^  (|- fyfa-jj)  +■  fCl-OjV 

Now  in  order  to  continue  the  calculation,  a  fiduciary  level 
must  be  chosen.    It  is  clear  that  it  should  not  be  either  too  near  zero 
or  too  near  1/2,  and  therefore  there  must  be  some  optimum  choice  of 
£     .    VonNeuman  chose    S  =°-  07 »  but  he  does  not  explain  how  he  ar- 
rived at  that  value.    For  this  value  of    £  ,     £     must  be  no  larger  than 
.  0107.    If  >  .  0107,  it  is  not  possible  to  make  the  probability  of 

error  approach  zero  by  making  N  approach  infinity.  If  6  is  near 
.0107  the  approach  to  zero  as  a  function  of  A/  would  be  slow.  Von 
Neuman  chose  to  make  calculations  for       6    =•  005. 

Substituting  these  values  in  theprobability  distribution  and 
integrating  over  the  tail  gives  the  following  approximate  expression  for 
probability  of  error. 

P     =       6. 4  -8.  6N 

j   10    10, 000  (1.  19) 

It  is  convenient  to  define  a  quantity   E  called  reliability,  as 
E     =  -logPE.  (1.  20) 

For  this  situation, 

N 

F.  ^     8.6N      -  log  6.4V i/21og  N  =  ioQQ  (1.21) 
10000 

if  the  base  10  is  chosen. 

For  a  human  brain,  an  overestimate  of  the  mean  free 
number  of  single  operations  between  errors  is 

13 

(10     components)  •  (10  operations  per  second) 

•  (3600-  24"  365  seconds  per  year)-  (100  years) 
23 

=  10 
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This  would  require  E  to  be  23,  or     N  to  be  roughly  23,  000.  Now 
correction  is  necessary;  since  we  estimated  that  the  "brain"  had  10 
elements  and  since  it  is  already  multiplexed  according  to  our  assump- 
tion, the  error  free  design  must  have  had  roughly  10*  23,000     =  5x10' 
Using  this  figure  m  place  of  10*3,  we  find    n    to  be  roughly  18,  500,  a 
modest  change  in     N  . 

Each  prototype  element  requires  3N     elements,    N  for 
calculating,  and  2N  for  restoring.    Thus  the  number  of  elements  in  the 
machine  is  multiplied  by  something  like  55,  000. 

Cutting  these  figures  to  a  minimum,  perhaps  the  number  of 
elements  in  the  brain  should  be  taken  as  10^  and  the  time  to  a  matter  of 
days  instead  of  100  years  gives  an  estimate  of  the  number  of  operations 
between  errors  of  roughly  1014  instead  of  102^.    This  doesn't  even  cut  N 
to  one  half  its  previous  value. 

Of  course  the  forgoing  statements  aren't  meant  to  imply  that 
the  brain  is  organized  along  these  lines.    In  fact  it  almost  certainly  is  not. 
For  small  values  of  N  this  multiplexing  makes  the  error  probability  great- 
er, and  therefore  gradual  evolution  of  a  system  like  this  would  be  unlikely 
to  occur. 

As  another  example,  consider  a  computing  machine  of  say 
1000  elements  with  10^  operations  per  second  and  perhaps  a  requirement 
of  3  hours  mean  free  time  between  errors.    The  mean  number  of  opera- 
tions between  errors  would  be  roughly 

1000xl05x(3600x3)     =  10l2» 

so  that  E  should  equal  about  12.    This  would  require  an  N  of  about  12000, 
or  36,  000  times  as  many  elements  as  in  the  original  design.    Of  course 
this  assumed  =.  005,  which  is  very  poor  compared  to  actual  com- 

puter elements. 


The  Portfolio  Problem  and  How  to  Pay  the  Forecaster 

These  notes,  taken  by  W.  W.  Peterson,  cover  several 
lectures  in  the  Seminar  on  Information  Theory  offered 
by  C.E.  Shannon  at  M.I.T.,  Spring  Term,  1956. 


The  Portfolio  Problem 


The  following  analysis,  due  to  John  Kelly,  was  inspired  by 
news  reports  of  betting  on  whether  or  not  the  contestant  on  the  TV  program 
"$64,000  Question"  would  win.    It  seems  that  one  enterprising  gambler  on 
the  west  coast,  where  the  program  broadcast  is  delayed  three  hours,  was 
receiving  tips  by  telephone  before  the  local  telecast  took  place.    The  ques- 
tion arose  as  to  how  well  the  gambler  could  do  if  the  communication  channel 
over  which  he  received  the  tips  was  noisy. 

Consider  first  the  case  where  there  are  two  equally  likely 
events  on  which  the  gambler  may  bet  with  1  -1  odds.    Suppose  the  gambler 
receives  tips  which  he  knows  are  correct.    Then  he  can  double  his  money 
each  time  he  bets.    If  he  starts  with  VQ  dollars,  after  n  bets  he  will  have 
Vn=V02n  dollars.    This  is  equivalent  to  an  interest  rate  of  100%.    This  sug- 
gests the  definition  of  effective  interest  rate  r  : 

1  v 
vo 

Now  consider  the  case  in  which  the  tips  have  only  probability 
p  ^  1/2  being  correct.    Probability  theory  states  that  the  expected  winnings 
are  greatest  when  the  gambler  always  bets  all  his  money  on  the  event  which 
his  tip  indicates  is  most  likely  to  occur.    His  probability  of  going  broke  after 
n  bets,  however,  is  equal  to  (4— pt11,  and  this  approaches  asw  as  n  approaches 
infinity.  Csa€ 

An  alternative  approach  would  be  for  the  gambler  to  bet  a  frac- 
tion e  of  his  money  on  each  bet.    If  he  starts  with  VQ  and  wins  on  the  first  bet 
he  will  have  2e VQ+  ( 1  -e) VQ=(  1  +  e) VQ.    If  he  loses  he  will  have  only  (l-e)VQ.  It 
is  clear  that  each  successive  win  multiplies  his  holdings  by  1  +  e  while  each  suc- 
cessive loss  multiplies  his  holdings  by  1-e.    After  W  wins  and  L,  losses  he  will 
have 

W  I 
Vn=(l  +  e)     (1-e)  VQ 

dollars.     The  effective  interest  rate  is 

rn=  ™-  log  (l+e)+klog  (1-e) 
n  11 


1.    With  interest  rate  i,  after  n  periods, 
Vn=V0(l+i)n. 


Substituting  this  in  (1)  above  gives 

r  =  i_log2(l+i)n  =  log2(l  +  i). 
Thus  r  is  a  simple  monotone  function  of  the  interest  rate  in  the  ordinary  sense, 
and  maximizing  r  is  equivalent  to  maximizing  i. 


When  n  is  large  we  expect  the  fraction  of  wins  to  be  roughly  p,  i.  e.  asp, 
while -t- «q   =  1-p. 

Thus 

rn  a  G=plog(l  +  e)+qlog  (1-e). 

This  statement  can  be  made  more  precise  by  using  the  laws  of  large  numbers. 
According  to  the  weak  law  of  large  numbers,  given  any  two  positive  numbers 

£  and  £   a  number  N  can  be  found  such  that  if  n  ^  N,  the  probability  is  at 
least  1-  £  that     |  r-G|^£.    According  to  the  strong  law  of  large  numbers, 
given  any  two  positive  numbers  £  and  S  a-  number  N  can  bs  found  such  that 
the  probability  is  at  least  l-£    that  |  r-G|<^  £     after  N  bets  and  will  remain 
so  no  matter  how  many  more  bets  are  made.    An  equivalent  statement  is  that 
with  probability  one, 

lim     rn   =  G 
n — >00 

No  matter  which  way  you  look  at  it,  as  the  number  of  bets  be- 
comes very  large,  the  gambler  becomes  more  and  more  certain  that  his  effec- 
tive interest  rate  will  be  very  close  to  G. 

G  is  a  function  of  e  which  has  a  maximum  for  some  value  of^e  . 
It  is  easily  shown  that  the  maximum  occurs  when  l  +  e=2p,  and  hence  l-e=2q 
This  gives 

Gmax=  p  log  2p+ qlog2q  =  l +  plogp+ qlogq 
=  l-H(p) 

So  that  G  max  is  equal  to  the  rate  of  transmission  over  the  channel  by  which 
the  tips  are  received! 

If  one  gambler  bets  always  the  optimum  fraction  of  his  holdings 
while  a  second  bets  a  non-optimum  fraction  of  his  money  on  each  bet,  the  effec- 
tive interest  rate  for  the  first  approaches  G  max  with  probability  one  while  that 
for  the  second  approaches  some  lower  value.    It  follows  that  the  probability  ap- 
proaches one  as    n   approaches  infinity  that  the  first  gambler  will  have  more 
money  than  the  second.    The  same  result  holds  if  the  second  gambler  does  not 
bet  a  constant  fraction  of  his  money  on  each  bet  as  long  as  he  deviates  from  the 
optimum  by  at  least  some  fixed  amount  or  at  least  a  fixed  fraction  of  the  bets. 


1.  In  information  theory  the  problem  often  occurs  of  maximizing  an  expression 
of  the  form 

SAi  log  xi 

by  optimum  choice  of  the  x^  subject  to  the  constraint  that  their  sum  is  constant. 
The  solution  is  that  the  x^  are  proportional  to  the  A^. 


In  other  words  if  one  gambler  bets  according  to  the  above  scheme  and  a  second 
according  to  any  significantly  different  scheme,  the  probability  approaches  one 
as  n  approaches  infinity  that  the  first  gambler  will  have  more  money  than  the 
second  after  n  bets. 

This  is  not  to  say  that  this  method  of  betting  is  the  only  way  a 
"rational"  man  would  behave.    While  very  persuasive  in  a  general  way,  there 
are  situations  and  systems  of  values  or  utilities  which  would  lead  toother  methods 
of  play,  thus  if  the  (remote)  possibility  of  the  extreme  winning  of  2    VQ  were 
sufficiently  important  (e.g.  the  only  possible  way  to  save  the  gambler's  life)  he 
would  be  well  advised  to  bet  maximum  expectation  (all  on  the  most  probable  event). 

Now  consider  the  more  general  problem  in  which  there  are  m 
events  (outcomes  of  a  horse  race,  for  example)  with  probabilities  Pj,  P?,-.  . .  P  . 
The  gambler  receives  a  tip,  one  of  n  messages,  which  may  not  be  reliable,  per- 
haps because  of  noise  in  the  communication  channel.    But  the  gambler  is  assumed 
to  know  how  reliable  the  tips  are  by  knowing  the  probability  if  event  i  occurred 
(or  will  occur)  of  tip  j: 

Pi( j)  =  probability  of  tip  j  if  event  i  occurs. 

In  addition  to  this  the  odds  are  assumed  known 

C\.=  dollars  returned  per  dollar  bet  if  i  occurs.    The  odds  will  be 

called  fair  if 

Pi<*i  =  I- 
and  if  the  equality 

holds,  we  shall  say  there  is  "no  track  take".    (Note  that  "fair  odds"  implies 
"no  track  take"  since,  by  (7)  1/^   =  P.    and    ^IL  p  =1- )     "No  track  take" 
turns  out  to  simplify  the  analysis  grea\ly,  since  it  permits  covering  bets  with 
no  loss,  and  hence  makes  betting  all  of  one's  holdings  on  every  bet  no  less 
general  than  permitting  holding  back  part  of  one's  money.    Note  that  if  one  bets 
1     dollars  on  each  event,  he  will  have  bet  exactly  one  dollar  and  will  have  one 
d^1    dollar  returned  regardless  of  the  outcome. 

As  an  example,  in  pari  mutual  betting,  the  track  takes  a  cer- 
tain percent  of  all  money  bet  and  divides  the  rest  among  the  people  who  bet  on 
the  winning  horse.    If  the  track  takes  t  percent,  and  if  n^  dollars  are  bet  on  the 
ith  event,  the  odds  are 


and 

1       1  2ni 


(9) 


(Xi  1-t  (10) 

If  there  is  no  track  take,  t  =  0,  and 


5:  i_  =  i. 


The  gambler's  strategy  can  be  described  by  giving  the  percent 
of  his  holdings  which  he  will  bet  on  event  i  if  he  receives  tip  j.    This  will  be 
denoted  by  a  (i/j). 

First  let  us  assume  fair  odds,  which  implies  no  track  take.  As 
was  stated  above,  this  means  there  is  no  loss  of  generality  in  assuming  that  the 
gambler  bets  all  his  holdings,  since  he  can  cover  bets  with  no  risk  of  loss.  Then 
each  bet  multiplies  his  holdings  by  a  factor  a(i/j)0<>i  if  event  i  occurs  and  he  had 
received  tip  j.    Suppose  W(i,  j)  denotes  the  number  of  times  he  received  tip  j  and 
event  i  occurred  in  a  total  of  n  bets. 

Then 

Vn=T/    t(i'j)Cll  Vo  (U) 

L  J 

This  gives  an  effective  interest  rate 

r  =     -Si  log  [a(i/jW]  (12) 

n         i,  j        n  ^~  1  — 

which  has  as  its  limit  with  probability  one, 

G=  Sp.pftog  oQ  (13) 

i,j  1 

The  relationship  between  r  and  G  is  the  same  as  in  the  simple  case  discussed 
first. 

With  fair  odds,  PC^l/P^.  and  hence, 

G=     S  PiPi  (j)  log  S.  PiP,(j)  log  P.  (14) 

i,  J  if  3       1  1 
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Summing  on  j  first  and  noting  thatJS.  p.(j)=l,  the  last  term 

becomes 

-^P.logP.=  H  (x)  (15) 
•*    i  1 

Because  of  "no  track  take"  we  can  assume  that  the  gambler  will  bet  all  his 
money,  i.e.  we  can  assume  the  constraint        a(i/j)  =  l,  and  we  can  maximize 
separately  the  parts  of  the  sum  in  (14)  for  each  value     of  the  index  j.  As 
before,  (equation  (6)),  the  a(i/j)  must  be  proportional  to  P-p^j).  Since 
~a  (i/j)=l, 

a(i/j)  =  P.p.(j) 

^PiPi(j)  (16) 


=  Piii  3)      =  q.(i) 
Q(j>  3 

where  p  (i,  j)  is  the  probability  that  i  occurred  and  tip  j  was  received,  Q(j) 
is  the  probability  of  tip  j,  and  q'  (i)  is  the  probability  that  event  i  occurred 
if  tip  j  was  received.  Then 


Gmax  =     ^^p(i,  j)  log  qj(i)  +H(x) 


=  H(x)-Hy  (x)  =  R 

where  x  represents  the  event  and  y  the  tip.    But  again  this  is  just  the  rate 
of  transmission  over  the  communication  channel  carrying  the  tip! 

Now  suppose  that  the  odds  are  not  necessarily  fair,  but  that 
there  is  still  no  track  take.    The  only  change  is  that  we  cannot  assume  that 
0<^p.  =  l,  and  hence  the  last  term  is  2  PilogO(  •  instead  of  ^  pj  log  p..  Denoting 
this  by  H  ( ),    G  becomes 

G  =      -Hy  (x)  +  H  (CK) 


•Hy  (x)  +  H    (x  )  +  H(<*  )-  H  (x) 


R+R 

o 


where  R  is  the  rate  of  transmission  of  information  and  RQ  -  H(c/J-] 
RQ  is  independent  of  the  tips,  and  hence  we  can  see  ite  significance  by  con- 
sidering the  case  where  the  tips  give  no  information.    Then  R    is  the  max- 

o 

imum  effective  interest  rate  possible  with  no  tips.    RQ  is  greater  than  or 
equal  to  zero,  and  it  equals  zero  only  when     *    =  pn-,  i.e.  fair  odds.  R 

represents  the  maximum  effective  interest  rate  achievable  by  taking  ad- 
vantage of  the  fact  that  the  odds  are  not  fair. 


It  is  interesting  to  note  that  it  is  best  to  bet  an  amount  of 
money   a(i/j)  proportional  to  q.(i)  regardless  of  the  odds.    One  would  think 


that  to  take  best  advantage  of  unfair  odds  the  bets  should  be  adjusted  differently 
for  different  odds,  but  this  is  not  the  case,  at  least  for  this  type  of  betting. 


siderably  more   difficult  mathematically,  so  the  results  will  only  be  outlined 
here.    In  general  the  gambler  should  hold  back  some  money.    Arrange  the 
events  in  order  of  decreasing  expectation  (conditional  on  the  available  infor- 
mation), i.e.  in  order  of  goodness  of  the  bets.    At  some  point  a  line  is  drawn 
and  bets  placed  only  on  the  events  above  the  line.    Bets  are  made  in  proportion 
to  the  conditional  probability  of  their  occurrence,  holding  back  some  of  the 
money.    It  turns  out  generally  that  some  of  tne  events  bet  on,  the  ones  just 
above  the  line,  have  expectation  less  than  one,  i.e.  q j (i)  ^       1 ,  even  though 
such  bets  would  seem  to  be  quite  poor. 


This  case  is  con- 
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How  to  Pay  the  Forecaster 

The  following  analysis  was  considered  by  I.J.  Good  in 
England,  and  by  Andy  Gleason  of  Harvard  University.    The  problem  con- 
cerns piecework  payment  to  a  consultant  for  predictions,  the  payment  to 
be  made  according  to  how  good  the  prediction  is. 

Instead  of  the  simple  weather  forecasts  which  are  cus- 
tomarily made,  use  a  more  sophisticated  system  in  which  probabilities 
are  given  for  each  possible  weather  event.    For  example  the  weather  man 
might  say,  "The  probability  is  one-half  that  it  will  snow,  one-sixth  that  it 
will  rain,  and  one-third  that  it  will  be  fair". 

Now  let  us  suppose  that  the  client  wishes  to  pay  the  forecaster 
day-by-day,  and  by  merit.    Thus  it  would  seem  that  a  relatively  high  fee  should 
be  paid  if  the  forecaster  assigns  a  high  probability  to  the  event  which  actually 
occurs,  and  a  low  fee  should  be  paid  if  the  forecaster  assigns  a  low  probability. 
But  exactly  what  function  of  p? 

Now  let  us  consider  the  forecaster's  viewpoint.    Let  us  suppose 
that  he  is  more  worried  about  how  much  money  he  will  be  paid  than  about  good 
forecasting.    Let  us  assume  that  the  function  of  p,  f(p)  which  is  his  payment, 
is  know  to  him  (as  part  of  his  contract)  and  let  us  assume  he  knows  the  probab- 
ilities of  the  various  events  which  he  is  attempting  to  forecast.    Then  he  might 
attempt  to  optimize  mathematically  his  payoff  by  reporting  a  number  a^  as  the 
probability  of  event  i  instead  of  its  true  probability  p..    His  expected  payoff  in 
that  case  would  be 

which  he  would  maximize  subject  to  the  constraint  a^  =  l ,  since  the  a^  must 

look  to  the  client  like  probabilities.  Using  the  method  of  Lagrangian  multipliers, 
we  find  that  the  a^  satisfy  the  equation 

;&i>  +  /\  =  o 

for  each  value  of  i.  These  equations  together  with  the  constraint  equation  enable 
the  forecaster  to  solve  for  the  prediction  a^  which  will  pay  best. 

Now,  getting  back  to  the  client's  viewpoint,  he  would  like  the 
prediction  which  he  receives  to  equal  the  actual  probability,  i.e.  a^p..  This 
will  be  the  case  if  and  only  if 

Pi'1  (Pi)  +  ^ 

for  all  p^,  or  in  other  words  if 
xf!(x)  +  ^=0 
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The  solution  of  this  differential  equation  is 

f(p)  =     -  ^   log  p  +  C 

and  if  this  is  to  be  a  maximum,  the  second  derivative  should  be  negative, 
or    ^    should  be  negative. 

f(p)  =   A  log  p+  B  A  >  0 

Now  consider  what  the  average  payment  is: 

Pave  =   A  S  Pi  lo8  Pi  +  B 
=   B      -  A  H  (x) 

The  forecaster  is  paid  a  fixed  salary  from  which  is  deducted  an  amount  pro- 
portional to  the  client's  uncertainty  about  the  predicted  event  after  the  pre- 
diction I 


NOTES  OH  RELATION  OF  ERROR  PROBABILITY  TO  DELAY  IN  A  HOIST  CHANNEL 
Lecture  "by  C.  E.  Shannon.  August  30,  1956 

The  ordinary  coding  theorems  assert  8 one thing  about  what  can 
be  done  in  the  limit  of  very  long  codes.    They  do  not  give  information 
as  to  how  long  the  code  must  be  to  approach  within  a  certain  tolerance 
of  the  limiting  behavior.    This  question,  the  relation  of  probability 
of  error  and  length  of  code,  is  of  considerable  interest.    Results  here 
bear  about  the  same  relation  to  earlier  results  as  the  central  limit 
theorem  in  probability  bears  to  the  law  of  large  numbers.    In  fact,  at  a 
key  point  in  proving  the  theorems,  the  law  of  large  numbers  is  used  in 
the  first  case  and  a  generalization  of  the  central  limit  theorem  in  the 
second  case. 

The  first  type  coding  theorem  relates  to  coding  a  source  into 
binary  digits  (say).    If  the  source  produces  letters  at  a  regular  rate 
and  block  coding  is  to  be  used  a  result  may  be  obtained  relating  error 
probability  (this  is  here  the  probability  of  rare  sequences  for  which  no 
binary  sequences  are  available)  and  the  rate  at  which  binary  digits  are 
available.    It  is  convenient  to  use  a  measure  reJJjaMlliy,  E,  rather  than 
probability  of  error  directly, 

E  -  J  log  P."1 
n  e 

where  n  is  the  block  length  and  P    the  probability  of  error  with  best 
coding.    As  n  increases,  E  approaches  a  limit  in  the  case  of  sources 
described  by  a  Markoff  process.    For  the  simplest  case,  that  in  which 


the  language  consists  of  a  sequence  of  letters  chosen  independently  from 
a  finite  alphabet  with  probability  pt  for  the  1th  letter  in  the  alphabet, 
the  limiting  E  can  be  given  in  parametric  form  (parameter  s)  as  follows. 


Let  -  pj~*  /  ^  pj"".    Then  If  B(s)  is  the  limiting 

reliability  and  H(«)  the  rate  of  binary  digits  available  for  coding 
(per  letter  of  text),  we  hare 

B(e)  -  Z.  ^(s)  log 

*(•>  -  1  q^.)  log  q^.)-1 

A  complete  solution  can  also  be  given  in  the  general  Markoff  case  but 
is  more  involved. 

The  second  type  of  coding  theorem  relates  to  coding  a  sequence 
(say)  of  binary  digits  into  a  noisy  channel  in  such  a  way  as  to  have 
a  small  probability  o^  error  after  decoding.    The  problem  involving  delay 
in  this  ease  is  to  determine  for  a  block  length  of  code  n  and  an  input 
rate  R  the  probability  of  error  for  the  optimal  coda.    We  limit  ourselves 
to  discrete  memoryless  channels  with  finite  alphabets.    It  is  convenient 
also  to  use  a  reliability  measure  I  »  -  log  P"1. 

The  problem  is  that  of  estimating  X  as  a  function  of  E,  or,  as  it 
turnsout,  X  and  S  as  functions  of  s.    Upper  and  lover  bounds  are  found 
on  the  probability  of  error  for  codes  by  a  number  of  different  arguments. 
The  most  powerful  argument  far  showing  the  existence  of  codec  ie  by  the 
random  coding  procedure.    Bandoa  codes  are  improved  when  the  rate  R  is 
small  by  an  expurgating  procedure.    This  is  the  elimination  of  code 
words  which  are  particularly  close  together.    To  establish  lower  bounds 
on  the  probability  of  error,  the  most  powerful  argument  is  by  the 
sjhajfl  ™»ririTu>  method.    This  is  the  generalized  analog  of  arguments  to 
the  effect  that  one  cannot  get  more  than  J  spheres  of  volume  v  in  a  room 
of  volume  T.    The  expurgated  random  code  and  the  sphere  packing  argument 


determine  the  asymptatic  E  exactly  for  rates  H  between  a  certain  critical 
value  and  the  channel  capacity.    In  fact,  as  one  approaches  channel 
capacity  the  optimal  probability  of  error  for  a  given  delay  is  or re  and 
more  nearly  determined.    For  rates  below  the  critical  rate,  the  bounds 
diverge.    Another  type  of  lover  bound  on  probability  of  error,  suggested 
by  Ellas  for  the  binary  symmetric  channel,  becomes  more  powerful  in 
evaluating  E.    This  is  a  bound  based  on  the  minimum  separation  between 
words  in  a  code.    It  turns  out  that  for  rates  near  zero  the  probability 
of  error  is  controlled  chiefly  by  code  words  which  are  "close  together". 


In  mott  coannuni  cation  studies  the  analysis  stops  whan  the  message 
ia  recaired.    Ho  action  baaed  on  tha  message  ia  contemplated,    John  Sally 
has  considered  a  problem  in  which  action  ia  taken  baaed  on  the  recaired 
message ,  namely,  tha  messages  are  assumed  to  be  tips  on  the  outcome  of 
event  a  and  a  gambler  my  place  bete  on  these  event  a.    The  problem  la  to 
determine    the  gambler' a  optimal  system  of  betting  and  the  value  of  the 
channel  to  him.    It  ia  assumed  that  the  channel  keeps  operating  and  that 
the  gambler  can  reinvest  his  winnings.    If  after  n  plays  of  this  game 
the  gambler  haa  \  dollars,  we  define  his  affective  Interest  rate  aa 
H  "  n  log2  Vn  ^o*    We  aMUne  M  eTent8  entries  in  a  horse  race) 

with  probabilities  of  occurrence  PrP2  P„.    The  gambler  receivee  a 

tip,  one  of  n  messages,  which  may  not  be  reliable,  but  the  gambler  knows 
the  probability  p4(j)  of  tip  j  if  event  i  occurs.    The  available  odda 
for  betting  are  a±  dollars  paid  per  dollar  bet  if  i  occurs.    Odds  are 
called  fair  if  P^  ■  1.    We  aay  there  ia  no.  track  take  if        a"  "  1# 

Am  i 

In  the  case  of  no  track  take,  it  ia  poesible  to  effectively  hold  back  a 

dollar  by  betting  *"  dollars  on  event  i  for  each  i,  since  then  one  dollar 

ai 

is  bet  and  one  dollar  always  returned.    Thus  without  loss  of  generality 

all  the  capital  can  be  bet  each  time.  I 

Assuming  fair  odds,  (this  implies  no  track  take)  it  turns  out  that 
the  expected  interest  rate  is  maximized  if  the  gambler  beta  money  on 
event  i  when  tip  J  ia  received  in  proportion  to  P^jU).    when  he  beta 
this  way  hia  interest  rate  turns  out  to  be 

0  -  H(x)  -  Hy(x)  -  a 


1 


-  2  - 


That  is,  hit  interest  rats  is  the  rats  of  transmission  in  communication 
theory  oror  the  channel  carrying  the  tip.    His  interest  rats  is  better 
than  that  of  any  gambler  who  deflates  significantly  from  this  strategy 

(with  probability  l),  that  is,  any  gamblsr  who  does  not  bet  this  way  a 
fraction  of  time  >  c  >  o. 

If  there  is  no  track  take  but  the  odds  are  not  necessarily  fair, 
it  turns  out  that  tho  best  interest  rate  becomes 

G  -  R  +  R 

where  R  is  the  rate  of  transmission  for  the  channel  and  R    is  the 

o 

effective  interest  rats  with  no  tips.    It  is  the  rate  of  interest 


one  can  obtain  from  the  fact  that  the  probabilities       are  not  equal  to  the 

V 

The  situation  is  somewhat  more  complex  when  there  is  a  track  take. 


betting  odds 


Reference;    John  Kelly:    "A  Sew  Interpretation  of  Information  Rate-. 
Bell  System  Technical  Journal,  July.  1956. 


The  Fourth-Dimensional  Twist 
or 

v.- 

A  Modest  Proposal  in  Aid  of  the  American  Driver  in  England 
Claude  E.  Shannon 

An  American  driving  in  England  is  confronted  with  a  wild  and 
dangerous  world.     The  cars  have  the  driver  on  the  right  and  he  is 
supposed  to  drive  on  the  lef t  side  of  the  road.     It  is  as  though 
English  driving  is  a  left-handed  version  of  the  right-handed  American 
sys  tem. 

I  can  personally  attest  to  the  seriousness  of  this  problem. 
Recently  my  wife  and  I,    together  with  another  couple  on  an  extended  visit 
to  England,  decided  to  jointly  rent  a  car.     Usually  when  we  drove  the  men 
would  sit  in  the  front  seat,   the  women  in  the  back.     With  our  long- ingrained 
driving  habits   the  world  seemed  totally  mad.     Cars,  bicycles  and  pedestrians 
would  dart  out  from  nowhere  and  we  would  always  be  looking  in  the  wrong 
direction.     The  car  was  usually  filled  with  curses  from  the  men  and  with 
screams  and  hysterical   laughter  from  the  women  as  we  careened  from  one 
narrow  escape  to  another.     The  passengers  were  given  to  sudden  involuntary 
motions  -  shielding  the  face  or  slamming  on  non-existent  brakes.     The  turn 
indicator  and  windshield  wiper  controls  were  also  reversed  from  American 
practice  and  we  found  ourselves  signaling  turns  with  the  windshield  wiper  - 
fast  for  a  right  turn,   slow  for  a  left.     The  whole  driving  situation  was 
not  particularly  improved  by   the  narrowness  of  English  streets  and  the  high 
speed  of  English  drivers.     Nor  was  our  inner  security  increased  by  the 
predilection  of  the  English  for  building  stone  walls  immediately  adjacent 
to   the  roads. 

This  paper  will  develop  a  novel  solution  to  this  problem  which 


*     This  research  was  carried  out  in  Trinity  term,   1978  while   the  author  was 
a  Visiting  Fellow  at  Ail  Souls  College,  Oxford. 


incidentally  can  also  be  used  for  the  Englishman  driving  in  America. 

In  Fig.  1  we  see  two  triangles.     They  are  congruent  but 
one  cannot  be  slid  around  in  the  plane  to  coincide  with  the  other 
since  one  is,   so  to  speak,  a  left-handed  version  of  the  other.  A 
"f latlander" ,   limited  to  living  in  the  plane,  could  scarcely 
conceive  how  triangle  A  could  be  moved  into  coincidence  with  B,  but 
we,  as  three-dimensional  be_  .gs,   easily  understand  rotating  the 
triangle  A  about  one  of  its  sides  and  then  sliding  it  into  coin- 
cidence with  the  other. 1 

In  an  analogous  way,   in  three  dimensions  we  often  have 

right-  and  left-handed  objects  -  a  pair  of  gloves,  for  example,  or 

an  American  car  compared  to  an  English  car  of  the  same  type.     If  we 

had  access  to  a  fourth  dimension,  one  could  turn  a  left-handed  glove 

180°  through  the  fourth  dimension  and  it  would  reenter  the  third 

2 

dimension  as  a  right-handed  glove.      This  facility  would  be  useful 
in  many  ways.     Both  shoemakers  and  screwmakers  would  benefit.  The 
former  would  need  only  right-footed  lasts,  the  latter  only  right-handed 
taps  and  dies.     Left-handed  children  could  be  flipped  through  the 
fourth  dimension  to  become  right-handed,   since  the  world  of  tools, 
writing,   etc.,  is  for  the  most  part  more  friendly  to  the  right-handed. 
Contrariwise,  right-handed  baseball  pitchers  might  choose  to  become 
southpaws.     Our  American  driver  coming  to  England  might  choose  to 
undergo  this  fourth-dimensional  twist  which  would  turn  his  perception 
of  England  from  left-handed  to  right-handed. 

Alas,  no  one  has  found  a  method  to  rotate  an  object  through 
the  fourth  dimension,     gov/evar,  equally  effective  would  be  a  rotation 
for  our  American  driver  of  all  of  England  through  the  fourth  dimension. 
This  concept  no  doubt  sounds  grandiose  and  utterly  impractical  -  the 


Fig. . 1 
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idle  dream  of  a  mathematician  -  "but  we  will  show  that  it  is  not 
only  a  theoretical  possibility  hut  within  the  range  of  present-day 
technology* 

How  will  we  do  this?     In  a  word,  with  mirrors.     If  you 
hold  your  right  hand  in  front  of  a  mirror,  the  image  appears  as  a 
left  hand.     If  you  view  it  in  a  second  mirror,   after  two  reflections 
it  appears  now  as  a  right  hand,  and  after  three  reflections  again  as 
a  left  hand,  and  so  on. 

Our  general  plan  is  to  encompass  our  American  driver  with 
mirror  systems  which  reflect  his  view  of  England  an  odd  number  of 
times.     Thus  he  sees  the  world  about  him  not  as  it  is  but  as  it  would 
be  after  a  l80°  fourth-dimensional  rotation. 

To  accomplish  this  we  have  two  mirror  systems.     The  side 
mirror  system  is  shown  in  Fig.  2,  where  we  see  the  driver,  from  the 
back,  sitting  in  his  English  car.     There  are  five  mirrors  in  the  car, 
two  on  his  right,  tv/o  on  his  left,  and  one  above  his  head.  These 
serve  to  reflect  images  from  the  left  over  his  head  and  down  again  so 
they  come  in  from  the  right. ^     Similarly,   light  rays  from  the  right 
are  reflected  over  his  head  and  down  to  come  in  from  his  left.  Thus, 
if  he  turns  his  head  to  the  right  side  of  the  page,  he  will  see,  by  a 
triple  reflection,  an  image  of  the  object  (an  arrow)  which  is  on  the 
left  of  the  page.     In  the  same  manner,   if  he  looks  to  the  left  of  the 
drawing,  he  will  see  what  is  on  the  right  of  the  car. 

To  summarize,   this  group  of  five  mirrors  is  so  arranged  that 
when  he  looks  to  his  right  he  will  see  what  is  on  his  left  -  when  he 
looks  to  his  left  he  will  see  what  is  on  his  right. 

Another  set  of  mirrors  provides  for  forward  and  backward 
vision.     These  are  shown  in  Fig.  3,  where  v/e  see  the  driver  from  above. 
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For  forward  vision,   three  mirrors  reflect  the  front  visual  field 

4 

about  a  vertical  axis.       Some  light  rays  are  indicated  by  letters 
A  and  B  to  show  how  the  interchange  of  right  and  left  takes  place. 
The  object  (the  usual  arrow)  appears  to  the  driver  as  a  reversed 
image  (again  somewhat  farther  away  because  of  the  longer  path). 

A  second  set  of  three  mirrors  accommodates  vision  in  the 
backward  direction.     If  our  driver  should  turn  his  head  around, 
perhaps  in  driving  in  reverse  or  possibly  to  look  at  his  passengers 
in  the  back  seat,  he  will  again  see  a  left-right  reversed  image. 

These  four  mirror  systems  totally  encompass  our  American 
driver.     Wherever  he  looks,  he  sees  a  reversed  image  of  England  - 
always  reflected  three  times.     For  him,  England  has  been  rotated 
180°  through  the  fourth  dimension! 

A  further  detail  must  be  accounted  for  here.     The  rear- 
vision  mirror  in  an  ordinary  car  corresponds  to  one  reflection  -  in 
looking  through  it  we  see  words  reversed  and,   in  fact,  catch  a  tiny 
glimpse  of  the  left-handed  world  we  have  been  talking  about.     To  keep 
our  system  consistent,  and  to  keep  our  American  driver  comfortable, 
we  have  devised  a  rear-vision  mirror  using  a  double  reflection,  as 
shown  in  Fig.  3.     The  driver  looks  up  and  to  the  right,  as  he  would 


in  an  American  car. 


,  and  sees  out  by  a  double  reflection  through  the 
rear  window.     This  gives  him  the  only  glimpse  he  has  of  the  real 


"right-hand"  world,  since  a  double  reflection  preserves  handedness. 

In  Fig.  5  v»8  see  from  above  a  car  fitted  with  the  fourth- 
dimensional  twister.     The  actual  car  as  well  as  the  actual  English 
road  and  countryside,  are  shown  in  heavy  solid  lines.     In  reality, 
the  car  is  parked  on  the  left  side  of  the  road.     Another  car  is 
forward  to  the  right  and  the  road  turns  sharply  to  the  right.  The 
driver's  perception,  however,   because  of  his  mirror  system  which 


Fig.  5 
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reflects  everything  about  the  line  XY,   is  that  he  is  parked  on  the 
right  side  of  the  road,  that  the  other  car  is  at  his  left ,  and  that 
the  road  turns  sharply  to  the  left.     His  perception  of  this  situation 
is  shown  in  dotted  lines.     Kote  that  he  even  perceives  his  own  car  to 
have  changed  to  an  American  car,  and  his  passenger,  P,  on  the  front 
seat  now  appears  to  be  on  his  right  1 

Entering  this  car  may  he  a  bit  of  a  shock,  when  the  entire 
world  is  reflected  about  a  plane  through  the  driver's  seat,  but  after 
a  moment  our  American  will  feel  comfortable  and  at  home,  with  everything 
as  it  "should  be".     He  starts  his  engine  and  drives  down  the  road.  The 
road  actually  turns  sharply  to  the  right.     In  his  perception  of  course 
it  turns  sharply  to  the  left,   so  of  course  he  turns  to  the  left, 
directly  into  the  stone  wall,  and  is  instantly  killed. 

This,  of  course,   is  what  would  have  happened  had  we  not  fore- 
seen his  natural  reactions  to  a  reversed  perception  of  the  world.  One 
must  reverse  not  only  the  sensory  iniput  but  also  the  motor  output.     Fig.  6 
shows  an  attachment  to  the  steering  wheel  which  reverses  its  operation. 
When  turned  to  the  right,  the  vehicle  actually  turns  to  the  left  and 
vice  versa.     This  operates  much  as  differential  gears  in  automobiles. 

With  this  addition  cur  American  driver  will  perceive  a  curve 
to  the  left  and,   in  natural  response,  turn  to  the  left.     In  fact  the 
curve  will  be  to  the  right  and  the  mechanism  will  reverse  his  intent 
and  turn  the  car  to  the  right. 

This,   then,   is  the  basic  idea  of  the  fourth-dimensional  twist. 
There  are,  however,   some  loose  ends  to  be  dealt  with.     The  perceptive 
reader  may  wonder  about  roau  signs.     Our  American  driver,  viewing 
everything  through  a  triple  reflection,   sees  all  of  the  road  signs 


Fig.  6 
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in  reverse,  as,  for  example,  in  Fig.  7.    How  is  he  to  find  his  way 
about?     The  answer  is  ridiculously  simple.     We  have  already  pointed 
out  that  his  rear-vision  mirror  gives  a  double  reflection  and  hence 
a  normal  view  of  the  real  world.     All  he  need  do  is  hack  his  car  up 
to  the  road  sign  and  read  it  through  his  rear-vision  mirror! 

A  more  troublesome  problem  is  that  of  centrifugal  force. 
In  the  situation  of  Fig.   5,   our  driver  is  actually  turning  to  the  • 
right  but  perceives  himself  to  be  turning  to  the  left.  Centrifugal 
force  Will  opt  for  actuality.     Our  driver  will,   surprisingly,  find 
himself  driven  to  the  inside  of  the  curve  rather  than  the  outside, 
a  most  uncomfortable  and  confusing  sensation. 

To  solve  this  problem,   the  reversal  of  centrifugal  force, 

might  seem  as  impossible  as  the  twist  of  England  through  the  fourth 

dimension.     After  all.  centrifugal  force  is  given  by  the  formula 

CO 

f  =  m  !Z_ 

B 

A  radius  H  of  course  is  always  positive,  Cm  as  a  square  is  necessarily 
positive,  and  surely  a  mass  in  must  be  positive,   so  how  can  we  arrange 
for  the  centrifugal  force  f  to  be  negative?     Like  Columbus  and  the  egg, 
the  answer  is  very  simple  when  given.     If  we  immerse  the  mass  in  a 
liquid  of  higher  density,   it  acts  as  though  it,   itself,  had  a  negative 
mass.     The  liquid  itself  presses  the  object  in  the  direction  of 
acceleration! 

This  concept  is  shown  in  Fig.  8.     Our  driver  is  now  enclosed 
in  a  scuba-diving  suit  within  a  compartment  which  is  filled  with  a 
liquid  having  a  specific  gravity  of  approximately  2.     Of  course  he 
v/ould  tend  to  rise  in  this  liquid  but  he  is  held  down  firmly  by  his 
seatbelt.     A  snorkel  provides  for  his  breathing  and  altogether,  with 
our  various  devices,  he  feels  very  much  as  though  he  were  at  home  in 
America! 


2QUT2  OHIJOIJK) 

qOJJAW  £3V0 

LI 

MOTTO 3  BKAMOHAH 
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Fig.  8 


FOOTNOTES: 


1.  Mathematically,  such  a  rotation  (about  the  y  axis)  can  be 
represented  by  the  transformation 

As  J^goes  from  o  to  180°,  the  point  x,y,0  rotates  from  the 
original  plane  about  the  y  axis  and  tack  into  the  plane, 
becoming  -  x,y. 

2.  The  mathematical  analogue  of  the  previous  transformation 
is  that  the  point  x ,  y ,  e ,  o  fL^At h  ©  going  from  0 
to  180  .     The  point  x,y,z  rotates  about  the  y,z  plane  and 
ends  up  back  in  the  three-dimensional  space  as  -  x,y,s,o, 
a  mirror  image  with,  the  y,  s  plane  as  the  mirror. 

3.  The  image  is  shown  here,  for  simplicity,  at  the  same  distance 
from  the  driver  as  the  object.     Actually,   it  would  appear 
somewhat  farther  because  of  the  "detour"  around  his  head.  This 
difference  would  be  only  a  foot  or  so,  but  should  be  kept  in 
mind  in  close  driving  situations. 

4.  In  Fig.  4  we  have  shown  another  way  of  achieving  the  desired 
reflection  of  the  forward  and  backward  fields  using  large 
"roof"  prisms  in  place  of  the  triple  mirrors.     While  more 
costly,   this  method  would  considerably  reduce  the  distance 
distortion. 


A  Rubric  on  Rubik  Cubics* 
Claude  E.  Shannon 

Once  puzzledom  was  laissez  faire 
With  rebus,  crosswords,  solitaire. 
Comes  now  the  Rubik  Magic  Cube 
For  Ph.  D.  or  country  rube. 
This  fiendish  clever  engineer 
Entrapped  the  music  of  the  sphere. 
It's  sphere  on  sphere  in  all  3  D  - 
A  kinematic  symphony! 

Ta!  Ra!  Ra!  Boom  De  Ay! 
One  thousand  bucks  a  day. 
That's  Rubik's  cubic  pay. 
He  drives  a  Chevrolet/2* 

Forty-three  quintillion  plus(3) 
Problems  Rubik  posed  for  us. 
Numbers  of  this  awesome  kind 
Boggle  even  Sagan's  mind.(4) 
Some  chaps  pry  their  cubes  apart 
Then  reassemble  to  the  "start". 


Not  cricket!  A  rude  game's  afoot 
And  up  with  which  we  will  not  put! 

Ta!  Ra!  Ra!  Boom  De  Ay! 

Cu-bies  in  disarray? 

First  twist  them  that-a-way, 

Then  turn  them  this-a-way. 

Respect  your  cube  and  keep  it  clean. 

Lube  your  cube  with  Vaseline. 

Beware  the  dreaded  cuber's  thumb, 

The  callused  hand  and  fingers  numb.(5) 

No  borrower  nor  lender  be. 

Rude  folk  might  switch  two  tabs  on  thee, 

The  most  unkindest  switch  of  all, 

Into  insolubility. (6) 

In-sol-u-bility. 
The  crudest  place  to  be.(7) 
However  you  persist 
Solutions  don't  exist. 

While  most  folk  watch  the  idiot  tube 
Cubemeisters  spin  the  Rubik  cube. 
Minh  Tai's  the  champ  —  he's  fast  as  sin. 
Minh  solves  his  cube  in  half  a  min.(8) 
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John  Conway  leads  a  Cambridge  pack 
And  solves  his  cube  behind  his  back.(9) 
Singmaster  write  THE  BOOK  —  first  rank; 
Now  cubes  while  riding  to  the  Bank.(10) 

Here  now  a  heavyweight! 
Programming  potentate! 
Software  sophisticate! 
Morwen  B.  Thistlethwaite!(11) 

Eschewing  this  dull  3  D  place 
Joe  Buhler  cubes  in  hyperspace.(12) 
All  hail  Dame  Kathleen  Ollerenshaw, 
A  mayor  with  fast  cubic  draw.(13) 
Is  cubing  just  a  crashing  bore? 
Let  Talken's  robot  do  this  chore.(14) 
God  moves  in  geodesic  ways 
And  solves  His  cube  in  twenty  plays.(15) 

Cubemeisters  one  and  all, 
Their  cubes  find  final  rest 
Bronzed  in  the  Hall  of  Fame 
In  lovely  Budapest. 

The  battle's  joined  in  steely  grip: 


Man's  mind  against  computer  chip, 

With  theorems  wrought  by  Conway's  eight 

'Gainst  programs  writ  by  Thistlethwaite. 

Can  multi-billion  neuron  brains 

Beat  multi-megabyte  machines? 

The  thrust  of  this  theistic  schism  — 

To  ferret  out  God's  algorism! 

CODA: 

He  (hooked  on  cubing)  with  great  enthusiasm: 

Ta!  Ra!  Ra!  Boom  De  Ay! 
Men's  schemes  gang  aft  agley. 
Let's  cube  our  life  away! 

She:  Long  pause  (having  been  here  before): 
 OY  VAY! 


(2) 


(1)  When  T.  S.  Eliot  published  "The  Waste  Land"  in  1922  with  a  wealth  of  footnotes,  there  was 
considerable  commotion  among  the  critics  —  should  a  work  of  art  stand  on  its  own  feet  or  refer  to 
such  weighty  tomes  as  The  Golden  Bough.  The  ambiguity,  obscurity  and  even  prurience  of  modern 
poetry  are  also  under  attack.  We  intend  this  to  be  clean  as  a  hound's  tooth,  crystal  clear,  sensible  as 
a  dictionary,  and  with  footnotes  galore. 

First  off,  this  may  be  either  read  as  a  poem  or,  better,  sung  to  "Ta!  Ra!  Ra!  Boom  De  Ay!"  (with 
an  eight  bar  chorus).  The  verses  should  be  sung  solo,  in  a  slightly  bitter  sardonic  manner,  a  la  Noel 
Coward  or  Bea  Lillie;  the  choruses,  in  contrast,  a  joyous  rousing  salute  to  the  cube. 
A  little  poetic  license  here  —  the  Wall  Street  Journal,  Sept.  23,  1981,  reports  Rubik  as  receiving 
$30,000  a  month  from  cubic  royalties,  but  driving  a  "run-down  rattling  Polski  Fiat".  This  would 
neither  scan  nor  rhyme  as  well  as  Chevrolet. 

(3)  There  are 

■  4-  •  —  -  43252  00327  44898  56000 
2        3  2 

possible  arrangements  of  the  cube. 

(4)  It  would  take  W/Zions  and  W//ions  of  "billions  and  billions"  for  forty-three  quintillion  plus. 

(5)  While  not  as  debilitating  as  weaver's  bottom  or  hooker's  elbow,  cuber's  thumb  can  be  both  painful 
and  frustrating.  For  more  on  these  occupational  ailments  see  recent  issues  of  "The  New  England 
Journal  of  Medicine". 

(6)  A  friend  of  mine,  Pete,  an  expert  cuber,  told  me  of  encountering  a  friend  Bill  at  a  hobby  shop.  Bill 
gave  Pete  his  cube,  saying  that  he  had  been  working  for  days  without  success.  After  a  few  minutes, 
Pete  turned  it  into  a  position  where  he  could  see  that  two  tabs  had  been  interchanged. 

Pete:  Bill,  somebody  has  switched  two  tabs  on  your  cube. 

Bill:  That's  impossible.  I've  always  carried  it,  or  left  it  in  my  apartment,  and  nobody  has  keys 
to  get  in  there. 

Pete:  Nobody? 

Bill:  That's  right,  nobody.  Just  me  and  my  girlfriend. 
^  Especially  in  April. 

(8)  Minn  Tai,  World  Speed  Champion,  in  a  public  demonstration  solved  six  scrambled  cubes,  each  in 
less  than  30  seconds. 

(9)  Actually,  he  peeks  a  little.  John  Conway,  the  great  Cambridge  combinatorialist,  in  addition  to  his 
tour  de  force  blindfold  cubing  has,  with  his  colleagues,  contributed  much  to  Rubik  cube  theory. 

(10)  Singmaster,  David.  Notes  on  Rubik' s  Magic  Cube,  now  in  its  sixth  edition. 

A  pioneer  in  programming  computers  to  solve  the  cube.  His  program  solves  the  cube  in  52  or  fewer 
moves. 

(12)  Group  theorist  Buhler  and  his  colleagues  have  developed  a  theory  of  higher  dimensional  cubes. 
^13^  Renaissance  woman,  sometime  mayor  of  Manchester,  recreational  mathematician,  expert  cubist  and 
discoverer  of  the  cubist  thumb  syndrome  and  its  relation  to  the  fetlock  problem  in  horses. 

(14)  In  October  1981  the  writer  foresaw  the  need  for  a  cubing  machine  and  sketched  the  design  of  a  pair 
of  mechanical  hands  to  be  connected  to  a  computer  and  manipulate  a  cube.  In  the  summer  of  1982 
a  crack  team  of  one  M.I.T.  student  was  assembled.  Late  in  July  the  hands  were  making  their  first 
fumbling  attempts  to  hold  and  manipulate  a  cube,  when  we  received  a  crushing  newspaper  clipping 
from  a  friend.  It  seems  that  Dan  Talken  had  assembled  a  crack  team  of  Southern  Illinois  University 
students  and  beat  us  to  the  punch.  My  friend  wrote  one  word  across  the  slipping:  "Scooped!" 

(15)  Or  so  Singmaster  finds  it  tempting  to  conjecture. 


